OpenAI’s approach to Q-Learning has been drawing significant attention recently.

However, there’s a fundamental issue in the way Q-learning is typically implemented in deep learning and neural network environments. This concern is highlighted in the award-winning paper “Non-delusional Q-learning,” presented at NeurIPS.

The paper suggests a fundamental flaw in the blind application of Q-learning updates to deep neural networks. It points out that such updates can create a self-contradictory scenario where improving the network for the current batch of data inadvertently makes it less effective for other batches. This is akin to a situation in supervised learning where optimizing a network for a specific set of data may degrade its performance on other datasets.

For more insights, the full paper can be accessed here: Non-delusional Q-learning Paper(Follow up ICML paper: Practical Non-delusional-Q Learning )

I’m curious about others’ views on this topic. What do you think about the implications of these findings for the future of Q-learning in deep learning environments?

  • wind_dude@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Is there anything the hoopla over openAI using deep Q-learning other than random speculation?

    If anything I would guess DQN not q-learning.

    But all the papers people have pointed to speculating about this hoopla just mention active learning or RL without specifics.

    • residentmouse@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Yeah, so largely I think you’ve hit the nail but just in case you don’t know the fervour is a deliberately leaked project name “Q*” and the suggestion it precipitated the OpenAI board drama. Now, is this probably a tactic to keep prices high so stock sells @ the 65B valuation OAI had prior to the drama? Sure.

      But it’s still fun to speculate.