The title, pretty much.
I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.
The title, pretty much.
I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.
This seems like something that would be difficult to predict considering how fundamental what your changing is. The method you use to quantize it and how refined it is also matters a great deal.