Reuters is reporting that OpenAI achieved an advance with a technique called Q* (pronounced Q-Star).

So what is Q*?

I asked around the AI researcher campfire and…

It’s probably Q Learning MCTS, a Monte Carlo tree search reinforcement learning algorithm.

Which is right in line with the strategy DeepMind (vaguely) said they’re taking with Gemini.

Another corroborating data-point: an early GPT-4 tester mentioned on a podcast that they are working on ways to trade inference compute for smarter output. MCTS is probably the most promising method in the literature for doing that.

So how do we do it? Well, the closest thing I know of presently available is Weave, within a concise / readable Apache licensed MCTS lRL fine-tuning package called minihf.

https://github.com/JD-P/minihf/blob/main/weave.py

I’ll update the post with more info when I have it about q-learning in particular, and what the deltas are from Weave.

  • RogueStargun@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Q* is just a reinforcement learning technique.

    Perhaps they scaled it up and combined it with LLMs

    Given their recently published paper, they probably figured out a way to get GPT to learn their own reward function somehow.

    Perhaps some chicken little board members believe this would be the philosophical trigger towards machine intelligence deciding upon its own alignment.

    • herozorro@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Given their recently published paper, they probably figured out a way to get GPT to learn their own reward function somehow.

      you just need 2 GPTs talking with each other. the seconds acts as a critic and guides the first