Realistically, how far i can push my current PC?

constanzabestest@alien.top · 2 years ago

Realistically, how far i can push my current PC?

flurbz@alien.top · 2 years ago

My setup has the same amount of VRAM and RAM as yours and I’m running 20B models with tolerable speed, meaning it generates tokens at almost at reading speed. This is using the rocm version of koboldcpp under linux with a Q4_K_M model (I have 5600x and a 6700XT).

Using the settings below, VRAM is maxed out and RAM sits at about 24GB used.

./koboldcpp.py --model ~/AI/LLMS/models/mlewd-remm-l2-chat-20b.Q4_K_M.gguf --threads 5 --contextsize 4096 --usecublas --gpulayers 47 --nommap --usemlock --port 8334

I have no idea how this would perform on windows or with an nvidia card, but good luck.

FullOf_Bad_Ideas@alien.top · 2 years ago

Isn’t cublas specific to Nvidia cards and clBLAST compatible with both Nvidia and AMD? I am not sure how cublas could work with AMD cards, ROCm?

flurbz@alien.top · 2 years ago

You’re right, this shouldn’t work. But for some strange reason, using --usecublas loads the hipblas library:

Welcome to KoboldCpp - Version 1.49.yr1-ROCm
Attempting to use hipBLAS library for faster prompt ingestion. A compatible AMD GPU will be required.
Initializing dynamic library: koboldcpp_hipblas.so

I have no idea why this works but it does and since the 6700XT took quite a bit of effort to get going, i’m keeping it this way.

wakuboys@alien.top · 2 years ago

I can run similar models on my phone at reading speeds (i am illiterate)