load llama-2 in 8b quantization?

peterwu00@alien.top · 1 year ago

load llama-2 in 8b quantization?

vec1nu@alien.top · 1 year ago

I haven’t used gptq in a while, but i can say that gguf has 8 bit quantization, which you can use with llamacpp. Furthermore, if you use the original huggingface models, the ones which you load using the transformers loader, you have options in there to load in either 8 or 4bit.

peterwu00@alien.top · 1 year ago

thanks!

mcmoose1900@alien.top · 1 year ago

Grab the original (fp16) models. They are quantized to 8-bit on the fly with bitsandbytes.