How to run 70B on 24GB VRAM ?

BlueMetaMind@alien.top · 2 years ago

How to run 70B on 24GB VRAM ?

TuuNo_@alien.top · 2 years ago

I would suggest you to use Koboldcpp and run GGUF. A 70B Q5 model, with around 40 layers loaded into GPU, should have more than 1t/s. At least for me, I got 1.5t/s with 4090 and 64GB ram using Q5_K_M.

silenceimpaired@alien.top · 2 years ago

I could never get up and running on Linux with Nvidia. I used Kobold on Windows, but boy is it painful on Linux.

TuuNo_@alien.top · 2 years ago

Well, I have never used Linux before since the main purpose of my pc is gaming. But I heard running LLMs on Linux is overall faster.

silenceimpaired@alien.top · 2 years ago

It is… but koboldcpp doesn’t have a executable for me to run :/

giblesnot@alien.top · 2 years ago

I don’t know what you were running into but I’m running Pop_OS 22.04 (a modified version of Ubuntu,) as my OS with a 3090 and everything I have tried I just follow the basic install instructions on the home page and it works. Ooga booga, Automatic1111, Tortoise TTS, Whisper STT, Bark, Kobald, etc. I just follow the “run these commands” linux instructions and everything is groovy.

silenceimpaired@alien.top · 2 years ago

I’m on Pop, lol. I could get it to compile, but I must have missed a step for nvidia acceleration