I have 64 cores with 8ch ram, if i use more than 24-32 cores the speed slows down somewhat.
This is for token generation, prompt processing benefits form all the threads.
But it is much better to spend your money on gpus than cpu cores, i have 3X Radeon MI25 in a i9 9900k box, and that is more than twice as fast as the 64 core epyc build
Vcache only helps when you want to access lots of tiny chunks of data that fit inside the 96-128mb cache.
During inference you have to read the entire several Gb model for each token generation, so your botleneck is still the Ram bandwidth.