If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

  • yahma@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Yes. This is known as Mixture of Experts (MOE).

    We already have several promising ways of doing this:

    1. QMoE: A Scalable Algorithm for Sub-1-Bit Compression of Trillion-Parameter Mixture-of-Experts Architectures. Paper - Github
    2. S-Lora: Serving thousands of concurrent adapters.
    3. Lorax: Serve hundreds of concurrent adapters.
    4. LMoE: Simple method of dynamically loading Loras
    • sampdoria_supporter@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I can’t believe I hadn’t run into this. Would you indulge me on the implications for agentic systems like Autogen? I’ve been working on having experts cooperate that way rather than being combined into a single model.