Quantizing 70b models to 4-bit, how much does performance degrade?

ae_dataviz@alien.top · 2 years ago

Quantizing 70b models to 4-bit, how much does performance degrade?

AutomataManifold@alien.top · 2 years ago

Early research suggested that there was an inflection point below 4-bits, where things got markedly worse. In my personal use, I find that accuracy definitely suffers below there, though maybe modern quants are a bit better at it.

34B Yi does seem like a sweet spot, though I’m starting to suspect that we need some fine-tunes that use longer stories as part of the training data, because it doesn’t seem to be able to maintain the quality for the entire length of the context. Still, being able to include callbacks to events from thousands of tokens earlier is impressively practical. I’ve been alternating between a fast 13B (for specific scenes), 34B Yi (for general writing), and 70B (for when you need it to be smart and varied). And, of course, just switching models can help with the repetition sometimes.