Continuing my quest to choose a rig with lots of memory, one possibility is dual socket MBs. Gen 1 to 3 EPYC chips have 8 channels of DDR4, so this gives 16 total memory channels, which is good bandwidth, if not beating GPUs, but can have way more memory (up to 1024GB). Builds with 64+ threads can be pretty cheap.
My questions are
- Does the dual CPU setup cause trouble with running LLM software?
- Is it reasonably possible to get windows and drivers etc working on ‘server’ architecture?
- Is there anything else I should consider vs going for a single EPYC or Threadripper Pro?
There is a NUMA aware option in llama.cpp