• 0 Posts
  • 1 Comment
Joined 1 year ago
cake
Cake day: November 22nd, 2023

help-circle
  • I have a GTX 1080 with 8GB VRAM and I have 16GB RAM. I can run 13B Q6_K.gguf models locally if I split them between CPU and GPU (20/41 layers on GPU with koboldcpp / llama.cpp). Compared to models that run completely on GPU (like mistral), it’s very slow as soon as the context gets a little bit larger. Slow means that a response might take a minute or more.

    You might want to consider running a mistral fine tune instead.