I want to run a 70B LLM locally with more than 1 T/s. I have a 3090 with 24GB VRAM and 64GB RAM on the system.

What I managed so far:

  • Found instructions to make 70B run on VRAM only with a 2.5B that run fast but the perplexity was unbearable. LLM was barely coherent.
  • I randomly made somehow 70B run with a variation of RAM/VRAM offloading but it run with 0.1 T/S

I saw people claiming reasonable T/s speeds. Sine I am a newbie, I barely can speak the domain language, and most instructions I found assume implicit knowledge I don’t have*.

I need explicit instructions on what 70B model to download exactly, which Model loader to use and how to set parameters that are salient in the context.

  • mrjackspade@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I actually don’t know how much overhead that’s going to be. I’d start by just kicking it off on the command line first as a proof of concept, its super easy,

    5_K_M is just the quantization I use. There’s almost no loss of perplexity with 5_K_M, but its also larger than 4 which is what most people use.

    Name Quant method Bits Size Max RAM required Use case
    goat-70b-storytelling.Q2_K.gguf Q2_K 2 29.28 GB 31.78 GB smallest, significant quality loss - not recommended for most purposes
    goat-70b-storytelling.Q3_K_S.gguf Q3_K_S 3 29.92 GB 32.42 GB very small, high quality loss
    goat-70b-storytelling.Q3_K_M.gguf Q3_K_M 3 33.19 GB 35.69 GB very small, high quality loss
    goat-70b-storytelling.Q3_K_L.gguf Q3_K_L 3 36.15 GB 38.65 GB small, substantial quality loss
    goat-70b-storytelling.Q4_0.gguf Q4_0 4 38.87 GB 41.37 GB legacy; small, very high quality loss - prefer using Q3_K_M
    goat-70b-storytelling.Q4_K_S.gguf Q4_K_S 4 39.07 GB 41.57 GB small, greater quality loss
    goat-70b-storytelling.Q4_K_M.gguf Q4_K_M 4 41.42 GB 43.92 GB medium, balanced quality - recommended
    goat-70b-storytelling.Q5_0.gguf Q5_0 5 47.46 GB 49.96 GB legacy; medium, balanced quality - prefer using Q4_K_M
    goat-70b-storytelling.Q5_K_S.gguf Q5_K_S 5 47.46 GB 49.96 GB large, low quality loss - recommended
    goat-70b-storytelling.Q5_K_M.gguf Q5_K_M 5 48.75 GB 51.25 GB large, very low quality loss - recommended
    goat-70b-storytelling.Q6_K.gguf Q6_K 6 56.59 GB 59.09 GB very large, extremely low quality loss
    goat-70b-storytelling.Q8_0.gguf Q8_0 8 73.29 GB 75.79 GB very large, extremely low quality loss - not recommended