So in the last few weeks i have been experimenting with LLMs on my personal laptop (as I’m rarely at home) but I’m gonna have my pc with me in a few days. When running models (MythoMax 13b, mostly Q6_K and Q5_K_M GGUF) I can definitely feel my laptop not liking it. Slowdowns, crashes, service terminations and timeouts.

Now, the situation is this, I have unexpectedly gotten some money which i want to invest in PC parts.
My PC currently has 16GB of DDR5 Ram and a GTX 1070 with 8GB VRAM.
The idea now is to buy a 96GB Ram Kit (2x48) and Frankenstein the whole pc together with an additional Nvidia Quadro P2200 (5GB Vram).

Would the whole “machine” suffice to run models like MythoMax 13b, Deepseek Coder 33b and CodeLlama 34b (all GGUF)

Specs after: 112GB DDR5, 8GB VRAM and 5GB VRAM, CPU is a Ryzen 5 7500F

And the question i should have asked first, can the GTX 1070 and P2200 setup even work, like would text gen webui even detect both cards?

Sorry if thats a dumb question

  • Arkonias@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Save the $$$ for a few months and go and buy a used 3090 or two. It’ll be worth it in the long run, and save any headaches of trying to frakenstein a bunch of 8 GB cards together.

  • ccbadd@alien.topB
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    I would replace the DDR5 ram rather than add to it or your memory will run a lot slower and you just don’t need it if you’re going to use gpus for inferencing. Also, a P40 is probably money better spent with this config than the P2200.

    • Wortkraecker@alien.topOPB
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Thing is, I have the P2200 sitting in my shelf rn from my dads old workstation, so I wouldn’t have to buy it.

      • a_beautiful_rhind@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        13gb does not make for much. Especially when part of it is used for graphics and all old pascal architecture.

        By all means just put the card is and see where it gets you on 13b.