Continuing my quest to choose a rig with lots of memory, one possibility is dual socket MBs. Gen 1 to 3 EPYC chips have 8 channels of DDR4, so this gives 16 total memory channels, which is good bandwidth, if not beating GPUs, but can have way more memory (up to 1024GB). Builds with 64+ threads can be pretty cheap.

My questions are

  • Does the dual CPU setup cause trouble with running LLM software?
  • Is it reasonably possible to get windows and drivers etc working on ‘server’ architecture?
  • Is there anything else I should consider vs going for a single EPYC or Threadripper Pro?
  • nero10578@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Dual CPUs would have terrible performance. This is because the processor is reading the whole model everytime its generating tokens and if you spread half the model onto a second CPU’s memory then the cores in the first CPU would have to read that part of the model through the slow inter-CPU link. Vice versa with the second CPU’s cores. llama.cpp would have to make a system to spread the workload across multi CPUs like they do across multi GPUs for this to work.

    • vikarti_anatra@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      That’s why you have ‘numa’ option in llama.cpp.

      From my experience, number of memory channels do matter a lot so this mean that all memory sockets better be filled.