[R] Rethinking Open'sAI's Q-Learning : Insights from the Award-Winning 'Non-delusional Q-learning' Paper

Even_Campaign7385@alien.top · 2 years ago

[R] Rethinking Open'sAI's Q-Learning : Insights from the Award-Winning 'Non-delusional Q-learning' Paper

Red-Portal@alien.top · 2 years ago

We don’t even know whether it’s actually an RL approach lol

pm_me_your_pay_slips@alien.top · 2 years ago

it’s very likely something like this: https://arxiv.org/pdf/2305.18290.pdf

Or finetuning on high quality datasets

cthorrez@alien.top · 2 years ago

what is the basis on which you judge it “very likely”. The only information is a leaked rumor that there is something with the name “Q*”. How do we get from that to DPO?