There has been a lot of movement around and below the 13b parameter bracket in the last few months but it’s wild to think the best 70b models are still llama2 based. Why is that?

We have 13b models like 8bit bartowski/Orca-2-13b-exl2 approaching or even surpassing the best 70b models now

  • ChiefBigFeather@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    13b models magically being better then 70b models is a myth. Most of the 7b or 13b model headlines are just clickbait, the models being good at benchmarks because they where trained on benchmark data.

    Try Airo 70b 3.1.2, it is much, much better (for general purposes) then 99% of models out there. Yi based models are strong if you want the larger context.