If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

  • jxjq@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Does this use of mixture-of-experts mean that multiple 70b models would perform ?better than multiple 7b models

      • extopico@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        big is an understatement. Please do correct me if I got it wildly wrong, but it appears to be a 3.6TB colossus.