If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

  • feynmanatom@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Lots of rumors, but tbh I think it’s highly unlikely they’re using an MoE. MoEs work on batch size = 1 (you can take advantage of sparsity) but not on larger batch sizes. You would need so much RAM and would miss out on the point of using an MoE.

    • remghoost7@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Lots of rumors…

      Very true.

      We honestly have no clue what’s going on behind ClosedAI’s doors.

      I don’t know enough about MoEs to say one way or the other, so I’ll take your word on it. I’ll have to do more research on them.