I am going to build a LLM server very soon, targeting 34B models (specifically phind-codellama-34b-v2.Q4 GGUF GPTQ AWQ).
I am stuck between these two setups:
- 12400 + DDR5 6000MHz 30CL + 4060 Ti 16GB (GGUF; Split the workload between CPU and GPU)
- 3090 (GPTQ/AWQ model fully loaded in GPU)
Not sure if the speed bump of 3090 is worth the hefty price increase. Does anyone have benchmarks/data comparing these two setups?
BTW: Alder Lake CPUs run DDR5 in gear 2 (while AM4 run DDR5 in gear 1). AFAIK gear 1 offers lower latency. Would this give AM4 big advantage when it comes to LLM?
There’s really no comparison. The 4060s, even the Ti, have crap for memory bandwidth. 288GB/s in the case of the Ti. DDR5 is also not fast enough to make much difference. So that combo is not going to be speedy. It in no way compares to a 3090.
The real issue is that the consumer cpus / motherboards have very few lanes. DDR5 is plenty fast, but you are probably maxing out motherboard bandwidth with two sticks.
Would not surprise me at all if server CPU inference is somewhere between x3 and x5 times faster.