I want to run a 70B LLM locally with more than 1 T/s. I have a 3090 with 24GB VRAM and 64GB RAM on the system.

What I managed so far:

  • Found instructions to make 70B run on VRAM only with a 2.5B that run fast but the perplexity was unbearable. LLM was barely coherent.
  • I randomly made somehow 70B run with a variation of RAM/VRAM offloading but it run with 0.1 T/S

I saw people claiming reasonable T/s speeds. Sine I am a newbie, I barely can speak the domain language, and most instructions I found assume implicit knowledge I don’t have*.

I need explicit instructions on what 70B model to download exactly, which Model loader to use and how to set parameters that are salient in the context.

  • BlueMetaMind@alien.topOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Thank you. What does " at 5_K_M" mean ?
    Can I use the text web UI with Llama.cpp as model loader or is this too much overhead for ?

    • mrjackspade@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I actually don’t know how much overhead that’s going to be. I’d start by just kicking it off on the command line first as a proof of concept, its super easy,

      5_K_M is just the quantization I use. There’s almost no loss of perplexity with 5_K_M, but its also larger than 4 which is what most people use.

      Name Quant method Bits Size Max RAM required Use case
      goat-70b-storytelling.Q2_K.gguf Q2_K 2 29.28 GB 31.78 GB smallest, significant quality loss - not recommended for most purposes
      goat-70b-storytelling.Q3_K_S.gguf Q3_K_S 3 29.92 GB 32.42 GB very small, high quality loss
      goat-70b-storytelling.Q3_K_M.gguf Q3_K_M 3 33.19 GB 35.69 GB very small, high quality loss
      goat-70b-storytelling.Q3_K_L.gguf Q3_K_L 3 36.15 GB 38.65 GB small, substantial quality loss
      goat-70b-storytelling.Q4_0.gguf Q4_0 4 38.87 GB 41.37 GB legacy; small, very high quality loss - prefer using Q3_K_M
      goat-70b-storytelling.Q4_K_S.gguf Q4_K_S 4 39.07 GB 41.57 GB small, greater quality loss
      goat-70b-storytelling.Q4_K_M.gguf Q4_K_M 4 41.42 GB 43.92 GB medium, balanced quality - recommended
      goat-70b-storytelling.Q5_0.gguf Q5_0 5 47.46 GB 49.96 GB legacy; medium, balanced quality - prefer using Q4_K_M
      goat-70b-storytelling.Q5_K_S.gguf Q5_K_S 5 47.46 GB 49.96 GB large, low quality loss - recommended
      goat-70b-storytelling.Q5_K_M.gguf Q5_K_M 5 48.75 GB 51.25 GB large, very low quality loss - recommended
      goat-70b-storytelling.Q6_K.gguf Q6_K 6 56.59 GB 59.09 GB very large, extremely low quality loss
      goat-70b-storytelling.Q8_0.gguf Q8_0 8 73.29 GB 75.79 GB very large, extremely low quality loss - not recommended