I have a server with 512gb RAM and 2x Intel Xeon 6154. It has spare 16x pcie 3.0 slot once I get rid of my current gpu.

I’d like to add a better gpu so I can generate paper summaries (the responses can take a few minutes to come back) that are significantly better than the quality I get now with 4bit Llama2 13b. Anyone know whats the minimum gpu I should be looking at with this setup to be able to upgrade to the 70b model?Will hybrid cpu+gpu inference with RTX 4090 24GB be enough?

  • Ravenpest@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    I got a 4090, 128 GB of RAM. 70b runs fine at quant 5 and takes about 280 seconds to generate a message (full reprocessing) and around 100 less on a normal message. So I’d say yo’d be fine with that.