So I’m looking into Threadripper pro systems, which can offer a pretty good memory bandwidth as they are 8 channel, and can have a huge amount of RAM. (I can put a 3090 or two in there too.)
I’m wondering how much the core count is going to affect performance. For example, the 5955WX has 16 cores while the 5995WX has 64 cores. They can both use the same memory though. There’s little point spending extra if the limiting factor will be somewhere else.
I use a Ryzen 12 core and can use llama.cpp with the 70b 8bit fine. Do not bother with hyper-threads, though.
I have 64 cores with 8ch ram, if i use more than 24-32 cores the speed slows down somewhat.
This is for token generation, prompt processing benefits form all the threads.
But it is much better to spend your money on gpus than cpu cores, i have 3X Radeon MI25 in a i9 9900k box, and that is more than twice as fast as the 64 core epyc build