[R] Rethinking Open'sAI's Q-Learning : Insights from the Award-Winning 'Non-delusional Q-learning' Paper

Even_Campaign7385@alien.top · 1 year ago

[R] Rethinking Open'sAI's Q-Learning : Insights from the Award-Winning 'Non-delusional Q-learning' Paper

wind_dude@alien.top · 1 year ago

Is there anything the hoopla over openAI using deep Q-learning other than random speculation?

If anything I would guess DQN not q-learning.

But all the papers people have pointed to speculating about this hoopla just mention active learning or RL without specifics.

residentmouse@alien.top · 1 year ago

Yeah, so largely I think you’ve hit the nail but just in case you don’t know the fervour is a deliberately leaked project name “Q*” and the suggestion it precipitated the OpenAI board drama. Now, is this probably a tactic to keep prices high so stock sells @ the 65B valuation OAI had prior to the drama? Sure.

But it’s still fun to speculate.

Red-Portal@alien.top · 1 year ago

We don’t even know whether it’s actually an RL approach lol

pm_me_your_pay_slips@alien.top · 1 year ago

it’s very likely something like this: https://arxiv.org/pdf/2305.18290.pdf

Or finetuning on high quality datasets

cthorrez@alien.top · 1 year ago

what is the basis on which you judge it “very likely”. The only information is a leaked rumor that there is something with the name “Q*”. How do we get from that to DPO?

joshred@alien.top · 1 year ago

Just that they have a project known as q*.

Calm-Expression5549@alien.top · 1 year ago

Seems to be a good read. I never thought about Q learning has such a problem in practice.

Status-Effect9157@alien.top · 1 year ago

This whole Q-star hullaballoo just reminds me of HBO Silicon Valley’s “the bear is sticky with honey” episode