How to run 70B on 24GB VRAM ?

BlueMetaMind@alien.top · 2 years ago

How to run 70B on 24GB VRAM ?

BlueMetaMind@alien.top · 2 years ago

Thank you. What does " at 5_K_M" mean ?
Can I use the text web UI with Llama.cpp as model loader or is this too much overhead for ?

mrjackspade@alien.top · 2 years ago

I actually don’t know how much overhead that’s going to be. I’d start by just kicking it off on the command line first as a proof of concept, its super easy,

5_K_M is just the quantization I use. There’s almost no loss of perplexity with 5_K_M, but its also larger than 4 which is what most people use.

Name	Quant method	Bits	Size	Max RAM required	Use case
goat-70b-storytelling.Q2_K.gguf	Q2_K	2	29.28 GB	31.78 GB	smallest, significant quality loss - not recommended for most purposes
goat-70b-storytelling.Q3_K_S.gguf	Q3_K_S	3	29.92 GB	32.42 GB	very small, high quality loss
goat-70b-storytelling.Q3_K_M.gguf	Q3_K_M	3	33.19 GB	35.69 GB	very small, high quality loss
goat-70b-storytelling.Q3_K_L.gguf	Q3_K_L	3	36.15 GB	38.65 GB	small, substantial quality loss
goat-70b-storytelling.Q4_0.gguf	Q4_0	4	38.87 GB	41.37 GB	legacy; small, very high quality loss - prefer using Q3_K_M
goat-70b-storytelling.Q4_K_S.gguf	Q4_K_S	4	39.07 GB	41.57 GB	small, greater quality loss
goat-70b-storytelling.Q4_K_M.gguf	Q4_K_M	4	41.42 GB	43.92 GB	medium, balanced quality - recommended
goat-70b-storytelling.Q5_0.gguf	Q5_0	5	47.46 GB	49.96 GB	legacy; medium, balanced quality - prefer using Q4_K_M
goat-70b-storytelling.Q5_K_S.gguf	Q5_K_S	5	47.46 GB	49.96 GB	large, low quality loss - recommended
goat-70b-storytelling.Q5_K_M.gguf	Q5_K_M	5	48.75 GB	51.25 GB	large, very low quality loss - recommended
goat-70b-storytelling.Q6_K.gguf	Q6_K	6	56.59 GB	59.09 GB	very large, extremely low quality loss
goat-70b-storytelling.Q8_0.gguf	Q8_0	8	73.29 GB	75.79 GB	very large, extremely low quality loss - not recommended