The title, pretty much.

I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.

  • daHaus@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This seems like something that would be difficult to predict considering how fundamental what your changing is. The method you use to quantize it and how refined it is also matters a great deal.