The title, pretty much.
I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.
The title, pretty much.
I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.
So anyone wanting to play around with this at home, has to expect to drop about 4K or so for GPUs and a setup?
I can get 2 3090 for 1200€ here on the second-hand market