I really wish there was a site where you could plug in your hardware and see what t/s speed you could expect from it, so if anyone has a link like that, I’d be interested. I haven’t been able to find one, and feel like I’m pretty much a noob when it comes to understanding what parts of hardware are important for local fine tuning and inference and running models, so please bear with me as I ask a bunch of probably dumb questions.
Broadly and in order, I think single GPU VRAM matters (more gb the better), then local RAM matters (same, but speed matters too I think?), then PCIE bus bandwidth speeds in gb/s matters, then additional GPU’s matter (for 60% and 30% and decreasing speedups from there), and finally CPU and/or NVME space might matter a little. Does that sound broadly correct?
So the situation is I’ve got a ton of 30 series NVIDIA GPU’s from a mining operation I wrapped up.
I could never sell them on r/hardwareswap or anywhere else, bc nobody would buy in bulk, and I’m sure the hell not wasting my time selling and shipping 75+ individual GPU’s to whoever. I do have racks and mobos and power supplies and whatever too, but I don’t think that matters. I also have a decent amount of 6800 and 6700xt and 5700xt AMD cards, but I don’t think that matters either - please correct me if I’m wrong.
I’d like to use as many GPU’s as possible for local fine tuning and inference, and am trying to figure out the best path for that. After reading about PCIE bandwidth and the speedups from 2 and 3 additional GPU’s, I’m afraid the real answer is “sell some GPU’s and buy an M2 Ultra Mac pro” or something like that, but if we couldn’t do that route, what is the best path forward?
An EPYC server build with as many 3090’s and 3080’s as I can fit and either 96gb (2 sticks, full DDR5 speed) or 192gb (4 sticks, only DDR4 speed) of ram? Which ram config is better? I think the DDR5 vs DDR4 speed actually makes a difference, but am not sure how much of a difference.
Researching EPYC mobos, I think I can fit maybe 6 or 7 GPU’s into an EPYC build, does that sound about right? Anyone know of any PCIE-rich mobo’s or architectures that I could fit notably more GPU’s than that into? I do have a bunch of mining mobo’s, but don’t think they’re usable?
I’m pretty sure there’s nothing possible like a beowulf cluster of mining boards + GPU’s that you can use for model fine tuning / running, is that correct?
I also have a Threadripper linux box I could upgrade that can currently fit 4-6 GPU’s, and could upgrade to an AM5 mobo and a 79503xd CPU pretty easily. I don’t know how this stacks up against an EPYC build, does anyone have any ideas on that?
I looked up my current linux box mobo and the PCIE lanes only have 32gb/s bandwidth, so think a mobo upgrade to AM5 with 128gb/s would be necessary to get decent speeds, does that sound right?
Sorry for all the questions and my general lack of knowledge, any guidance or suggestions on maximizing a bunch of GPU’s are very welcome.
You could build an infiniband cluster. The 3090 would give you most bang for buck. Though it’s a lot more work than trading out for A100s, and the extra hardware will cost. You can get 9 GPUs on an single epyc server mobo and still have good bandwidth. So we are talking about manually sourcing and building 10 boxes.
But unless you are training stuff and have cheap electricity a cluster probably doesn’t make sense. No idea why you would need ~1800GB vram.
Homeboy’s waifu is gonna be THICC.
Thanks for pointing me to Infiniband, another thing for me to research. Sounds like a high-bandwidth supercomputer information coordination layer, so sort of like the beowulf cluster idea.
I actually do have cheap electricity thanks to solar + honking big lifepo4 battery bank.
Is this what AWS and other places where you can rent time on H100’s do? Have a bunch of A100 and H100 servers hooked up in arrays with Infiniband?