minus-squareWarm_Shelter1866@alien.topBtoLocalLLaMA@poweruser.forum•Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4linkfedilinkEnglisharrow-up1·10 months agoWhat does it mean that an LLM is a reward model ? , I always thought of rewards only in the RL field . And how would the reward model be used during finetuning? linkfedilink
What does it mean that an LLM is a reward model ? , I always thought of rewards only in the RL field . And how would the reward model be used during finetuning?