I am going to build a LLM server very soon, targeting 34B models (specifically phind-codellama-34b-v2.Q4 GGUF GPTQ AWQ).
I am stuck between these two setups:
- 12400 + DDR5 6000MHz 30CL + 4060 Ti 16GB (GGUF; Split the workload between CPU and GPU)
- 3090 (GPTQ/AWQ model fully loaded in GPU)
Not sure if the speed bump of 3090 is worth the hefty price increase. Does anyone have benchmarks/data comparing these two setups?
BTW: Alder Lake CPUs run DDR5 in gear 2 (while AM4 run DDR5 in gear 1). AFAIK gear 1 offers lower latency. Would this give AM4 big advantage when it comes to LLM?
The 3090 will outperform the 4060 several times over. It’s not even a competition - it’s a slaughter.
As soon as you have to offload even a single layer to system memory (regardless of the speed), you cut your performance by an order of magnitude. I don’t care if you have screaming fast DDR5 in 8 channels and a pair of the beefiest xeons money can buy, your performance will fall off a cliff the minute you start offloading. If a 3090 is within your budget, that is the unambiguous answer.