There has been a lot of movement around and below the 13b parameter bracket in the last few months but it’s wild to think the best 70b models are still llama2 based. Why is that?

We have 13b models like 8bit bartowski/Orca-2-13b-exl2 approaching or even surpassing the best 70b models now

  • FaustBargain@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Qwen 72b

    I can’t seem to find anything about qwen 72b except two tweets from a month ago that said it was coming out. who makes it? what’s it trained on? any details?

    • Thireus@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Curiously nobody from the previous comment upvoters have provided an answer to your question.