Suleyman_III@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

llama2 13B on Gtx 1070

3

1

llama2 13B on Gtx 1070

Suleyman_III@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

3

Can I run llama2 13B locally on my Gtx 1070? I read somewhere minimum suggested VRAM is 10 GB but since the 1070 has 8GB would it just run a little slower? or could I use some quantization with bitsandbytes for example to make it fit and run more smoothly?

Edit: also how much storage will the model take up?

Chat

frontenbrecher@alien.topB
link
fedilink
English
arrow-up
1·
2 years ago
use koboldcpp to split between GPU/CPU with gguf format, preferably a 4ks quantization for better speed. I am sure that it will be slow, possibly 1-2 token per second.