I haven’t been able to use that with my 3090Ti yet. I tried TheBloke’s GPTQ and GGUF (4bit) versions. The first runs into memory issues, the second, loaded with llama.cpp (which it seems to be configured on) loads, but is excruciatingly slow (like 0.07t/sec).
I must admit that I am a complete noob regarding all the different variants and model loaders.
Koboldcpp is the easiest way.
Get nous-capybara-34b.Q4_K_M.gguf (it just fits into 24GB VRAM with 8K context).
Here are my Koboldcpp settings (not sure if they are optimal but they work)
I haven’t been able to use that with my 3090Ti yet. I tried TheBloke’s GPTQ and GGUF (4bit) versions. The first runs into memory issues, the second, loaded with llama.cpp (which it seems to be configured on) loads, but is excruciatingly slow (like 0.07t/sec).
I must admit that I am a complete noob regarding all the different variants and model loaders.
Koboldcpp is the easiest way.
Get nous-capybara-34b.Q4_K_M.gguf (it just fits into 24GB VRAM with 8K context).
Here are my Koboldcpp settings (not sure if they are optimal but they work)
https://preview.redd.it/dco0bokvic1c1.jpeg?width=540&format=pjpg&auto=webp&s=bf188ea61481a9464593db79d690b26eb7989883