[D] Exclusive: Sam Altman's ouster at OpenAI was precipitated by letter to board about AI breakthrough

blabboy@alien.top · 2 years ago

[D] Exclusive: Sam Altman's ouster at OpenAI was precipitated by letter to board about AI breakthrough

residentmouse@alien.top · 2 years ago

OK, so full speculation: this project could be an impl. of Q-Learning (i.e unsupervised reinforcement learning) on an internal GPT model. This would no doubt be an agent model.

Other evidence? The * implies a graph traversal algorithm, which obviously plays a huge role in RL exploration, but also GPT models are already doing their own graph traversal via beam search to do next token prediction.

Are they perhaps hooking up an RL trained model to replace their beam search?

tomvorlostriddle@alien.top · 2 years ago

of Q-Learning (i.e unsupervised reinforcement learning) on an internal GPT model.

Potential efficacity aside, imagine the scenario of those blabbermouths just eternally yapping among each other and that unbelievably boring wall of text should be what brings about superintelligence :)

ReptileCultist@alien.top · 2 years ago

GPT models are already doing their own graph traversal via beam search to do next token prediction.

I don’t think GPT is often used in conjunction with beam search or is it?

VirtualHat@alien.top · 2 years ago

The star in Q* traditionally refers to a policy which is optimal.

JustOneAvailableName@alien.top · 2 years ago

Value function, pi is policy

Material_Policy6327@alien.top · 2 years ago

That’s my feeling on what’s actually being reported Poorly by the news