@Warm_Shelter1866 - Communick News

0 Posts
1 Comment

Joined 10 months ago

Cake day: November 18th, 2023

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

Warm_Shelter1866@alien.topBtoLocalLLaMA@poweruser.forum•Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4
link
fedilink
English
arrow-up
1·
10 months ago
What does it mean that an LLM is a reward model ? , I always thought of rewards only in the RL field . And how would the reward model be used during finetuning?

link
fedilink