Jugg3rnaut@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

Chassis only has space for 1 GPU - Llama 2 70b possible on a budget?

8

1

Chassis only has space for 1 GPU - Llama 2 70b possible on a budget?

Jugg3rnaut@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

8

I have a server with 512gb RAM and 2x Intel Xeon 6154. It has spare 16x pcie 3.0 slot once I get rid of my current gpu.

I’d like to add a better gpu so I can generate paper summaries (the responses can take a few minutes to come back) that are significantly better than the quality I get now with 4bit Llama2 13b. Anyone know whats the minimum gpu I should be looking at with this setup to be able to upgrade to the 70b model?Will hybrid cpu+gpu inference with RTX 4090 24GB be enough?

Chat

LLMJoy@alien.topB
link
fedilink
English
arrow-up
1·
2 years ago
Use the exllama v2 format with variable bitrate

Even a single 24GB GPU can support a 70b if it’s quantized

For example, I haven’t tried but I’m almost sure that 2.30b works on a single 24GB GPU: https://huggingface.co/turboderp/Llama2-70B-chat-exl2

I think you can even go higher than 2.30b