Reuters is reporting that OpenAI achieved an advance with a technique called Q* (pronounced Q-Star).
So what is Q*?
I asked around the AI researcher campfire and…
It’s probably Q Learning MCTS, a Monte Carlo tree search reinforcement learning algorithm.
Which is right in line with the strategy DeepMind (vaguely) said they’re taking with Gemini.
Another corroborating data-point: an early GPT-4 tester mentioned on a podcast that they are working on ways to trade inference compute for smarter output. MCTS is probably the most promising method in the literature for doing that.
So how do we do it? Well, the closest thing I know of presently available is Weave, within a concise / readable Apache licensed MCTS lRL fine-tuning package called minihf.
https://github.com/JD-P/minihf/blob/main/weave.py
I’ll update the post with more info when I have it about q-learning in particular, and what the deltas are from Weave.
Yeah I think its MCTS reinforcement learning algorithm. I think DeepMind is the best lab when it comes to depeloping strategy and planning capable agents, given how good AlphaZero and AlphaGo is, and if they integrate it with the “Gemini” project, they really might just “ecliplse” GPT-4. I don’t know how scalable it would be in terms of inference given the amount of compute required.
Have DeepMind released any leading-edge tools recently? MuZero was quite a few years ago now, and AlphaGo is ancient in AI terms.
DeepMind seem to have promised an awful lot, come up with a lot of clever announcements, but been very sparse on actual delivery of much at all.