tl;dr: I’m considering building a budget machine for tinkering with LLMs, but I’m not sure if this is a good idea and how to go about it.

For context: I work in a university department. I currently have access to a 2080 Ti on a shared machine, and we’re in the process of acquiring a small server with 2 L40 cards. So for any larger experiments, I will be able to use this shared machine.

However, I think I would like to have my own small machine for tinkering: trying different models and techniques, and just playing around, and preparing larger experiments to be run on the server. My focus is on teaching and education not on state-of-the-art research.

With aiming for a good amount of VRAM, the 4060 Ti 16GB seems to be the most obvious choice; I also like the low power requirements (regarding energy and cooling). But this card seems to have a poor reputation overall. I’m also not sure what currently the sweet spot w.r.t. the the CPU and memory is – I completely lost track of Intel’s and AMD’s generations over the last years.

Some additional comment regarding some common opinions

  • I simply like to have my own hardware and cloud services seem to be more expensive in the long run.
  • There is not really a good market of used GPUs where I’m located (Singapore), so the common suggestion “go with as used 3090” does not really work.

Any good suggestions, or am I naive with my idea of a budget machine? Thanks a lot!

  • sshan@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Mistral 7B is very good and can be run on 8gb vram. It was blazing fast on my 3070. I have a 4090 as well and for all intents and purposes its indistinguishable.

    Right now Mistral7B competes with the best 13B paramater models. Unless you plan on using code LLMs there aren’t many new 30B parameter models that matter that much.

    I have a 3070 on my proxmox home server with I think only 2 physical cores and 16gb ram allocated and I’m getting 40 + tokens per second.

    You wouldn’t be futureproofed but would work fine now.

  • Herr_Drosselmeyer@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    The 4060ti 16 has a bad reputation because it doesn’t provide any real improvement for gaming over the regular 4060 but that’s of no concern to us.

  • ttkciar@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    You can absolutely do interesting and useful things with very little hardware, with quantized models, especially if you don’t mind if inference is slow. My preferred quantization is q4_K_M (with GGUF and llama.cpp).

    I started with a spare Lenovo T560 Thinkpad with 8GB of RAM, which handled 7B models no problem. That’s a $120 eBay purchase. Once I was hooked, I shifted to one of the Dell T7910 in the homelab and moved up to larger models.

    I’m still not using a GPU for anything. It’s been CPU inference, which is slow but otherwise great.

    You could get just about any $300 desktop and put a decent GPU in it (16GB VRAM will allow fast inference with 13B models, and 24GB should allow heavily-quantized 30B) and enjoy fast inference. The most expensive bit is the GPU.

    See this sub’s wiki for more detailed hardware tips.

  • a_beautiful_rhind@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    They sell P40s on ali and ebay that ship from CN. Fill some used box with that and use llama.cpp. You can also try your hand with the dead cheap AMD Mi25. P100s are an option too if you want better FP16.

    All depends on what you want to do and what is importable/available and in your budget.