So I’m interested in applications that require memory more than speed, with high quality and a big context. I’m talking 100GB or more. Speed is still an important consideration. I don’t need snappy conversations, but getting through more stuff ‘overnight’ is still valuable.
3090s are affordable, but it would take 4 to 8 to get into the big memory category, and the primary issue is energy use. For batch use the PC could shut down after finishing, so idle power use wouldn’t be an issue. Are there motherboards that can completely shut off power to extra cards when they aren’t needed?
Mac Studio M2 Ultra can get 192GB of unified memory, with about 140GB usable. This isn’t as fast, obviously, but is meant to be acceptable for many applications.
What about PCs/servers with lots of mainboard RAM? Is this way slower than the Macs due to different architecture? If not it’s probably a lot cheaper. The CPU would need to do all the work, and I don’t know about how the energy efficiency would compare.
I would be grateful if anyone has data comparing speeds or joules per token for these broad options.
What model are you going to run that can accept 100GB of context?
I meant in total, but there do seem to be models with up to 100GB for context, like 01-ai/Yi-34B-200K.
Ooh… now I’ve got another model to play with. :D