I’m still new to this and I thought that 128gb CPU ram would be enough to run a 70b model? I also have an RTX 4090. However, everytime I try to run lzlv_Q4_K_M.gguf in Text Generation UI, I get “connection errored out”. Could there be a setting that I should tinker with?

  • Herr_Drosselmeyer@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    It should work with those specs. Not sure what “connection” it means. Perhaps post a screenshot of the console?

  • brobruh211@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I haven’t tried to run a model that big on CPU RAM only, but running a Q4_0 gguf of Causal 14B was already mind numbingly slow on my rig.

    General rule of thumb, always utilize as much of your VRAM (GPU RAM) as possible since CPU RAM is exponentially slower. I’m guessing your connection timed out because it just took to long to load/run.

    With a 4090, you can actually run lzlv 70B fully on your 24GB VRAM. Let’s not let your amazing GPU go to waste! Try these steps and let me know if it works out for you:

    1. Paste this on the Download box of text-gen-ui: waldie/lzlv-limarpv3-l2-70b-2.4bpw-h6-exl2
    2. Hit download. This should download an ExLlamav2 quant of lzlv that fits in your VRAM.
    3. Select the model from the drop down and just hit Load using the default settings. (Optional) You can tick “Use 8-bit cache to save VRAM”
    4. Enjoy! The perplexity of the file I suggested as high as lzlv_Q4_K_M, but at least you should be able to run it with no problems and get decent outputs as well.