Looking for CPU Inference Hardware (8 Channel Ram Server Motherboards)

jasonmbrown@alien.top · 2 years ago

Looking for CPU Inference Hardware (8 Channel Ram Server Motherboards)

fallingdowndizzyvr@alien.top · 2 years ago

A big issue for CPU only setups is prompt processing. They’re kind of OK for short chats, but if you give them full context the processing time is miserable. Nowhere close to 5 tok/sec.

That’s where context shifting comes into play. So the entire context doesn’t have to be reprocessed over and over again. Just the changes.