So I’m interested in applications that require memory more than speed, with high quality and a big context. I’m talking 100GB or more. Speed is still an important consideration. I don’t need snappy conversations, but getting through more stuff ‘overnight’ is still valuable.
3090s are affordable, but it would take 4 to 8 to get into the big memory category, and the primary issue is energy use. For batch use the PC could shut down after finishing, so idle power use wouldn’t be an issue. Are there motherboards that can completely shut off power to extra cards when they aren’t needed?
Mac Studio M2 Ultra can get 192GB of unified memory, with about 140GB usable. This isn’t as fast, obviously, but is meant to be acceptable for many applications.
What about PCs/servers with lots of mainboard RAM? Is this way slower than the Macs due to different architecture? If not it’s probably a lot cheaper. The CPU would need to do all the work, and I don’t know about how the energy efficiency would compare.
I would be grateful if anyone has data comparing speeds or joules per token for these broad options.
How important is local processing for you? It might be worth looking into renting a cloud server. Datacenter GPUs, like the A/H100s, have much more memory. Could be better bang for your buck if all you care about it throughput.
A valid option. I haven’t looked into prices for renting but it could make sense unless I will use it a lot.
What model are you going to run that can accept 100GB of context?
I meant in total, but there do seem to be models with up to 100GB for context, like 01-ai/Yi-34B-200K.
Ooh… now I’ve got another model to play with. :D