Does Dual EPYC work for LLMs?

EvokerTCG@alien.top · 2 years ago

Does Dual EPYC work for LLMs?

nero10578@alien.top · 2 years ago

Dual CPUs would have terrible performance. This is because the processor is reading the whole model everytime its generating tokens and if you spread half the model onto a second CPU’s memory then the cores in the first CPU would have to read that part of the model through the slow inter-CPU link. Vice versa with the second CPU’s cores. llama.cpp would have to make a system to spread the workload across multi CPUs like they do across multi GPUs for this to work.

vikarti_anatra@alien.top · 2 years ago

That’s why you have ‘numa’ option in llama.cpp.

From my experience, number of memory channels do matter a lot so this mean that all memory sockets better be filled.

sirus20x6@alien.top · 2 years ago

There is a NUMA aware option in llama.cpp