Was looking through my office window at the data closet and (due to angle, objects, field of view) could only see one server light cluster out of the 6 racks full. And thought it would be nice to scale everything down to 2U. Then day-dreamed about a future where a warehouse data center was reduced to a single hypercube sitting alone in the vast darkness.
They tend to fill the space. I mean if you drive by a modern data center, so much grid electrical equipment is just right there. Now if hypothetically supermachine uses all that power sure, small data center. Unless they have a nuclear reactor they should (fu felon musk) only rely on grid/solar/renewables.
I sometimes wonder how powerful a computer could be made if we kept the current transistor size we have now, but still built the machine to take up an entire room. At what point would the number of transistors and the size of the machine become more of a problem than a solution? 🤔
Isnt the main limiting factor signal integrity? Like we could do a CPU the size of a room now but it’s pointless as the stuff at one end wouldnt be able to even talk to the stuff in the middle as the signal just get fucked up on the way?
Signal integrity will probably be fine, you can always go with optical signalling for the long routes. What would be more of an issue is absurd complexity, latency from one end to the other, that kind of stuff. At some point, just breaking it down into a lot of semi-autonomous nodes in a cluster makes more sense. We kind of already started this with multi-core CPUs (and GPUs are essentially a lot of pretty dumb cores). The currently biggest CPUs all have a lot of cores, for a reason.
IIRC, light speed delay (or technically, electricity speed delay) it’s also a factor, but I can’t remember how much of a factor.
It’s significant already. If I get the math right (warning, I’m on my phone in bed at 3am and it’s been 10 years) I think that a 1 inch chip running at 3GHz clock rate could, if you aren’t careful with the design of the clock network, end up with half a clock cycle physically fitting on the chip. That is, the trace that was supposed to move the signal from one end of the chip to the other, would instead see the clock signal as a standing wave, not moving at all. (Of course people has (tried?) to make use of that effect. I think it was called “resonant clock distribution” or some such)
🤯
You think that if we can scale 6 racks down into one cube that someone wouldn’t just buy 6 racks of cubes?
They’ll always hunger for more.
They look silly now. Many data centers are not scaling up power per rack. With GPUs, there are often two chassis per rack.
I had this problem with Equinix! They limited our company to like 10kva per rack, and we were installing nvidia dgx servers. Depending on the model we could fit only one or two lol.
Have that problem ourselves, they didn’t provision power or cooling for this kind of density, and how do you pipe in multiple megawatts to a warehouse in the middle of nowhere?
Only if storage density out paces storage demand. Eventually, physics will hit a limit
Physics is already hitting limits. We’re already seeing CPUs be limited by things like atom size, and the speed of light across the width of the chip. Those hard physics limitations are a large part of why quantum computing is being so heavily researched.
Which means it doesn’t seem like the limit has been hit yet. For standard devices, the general market has not moved to the current physical limitations
I think what will happen is that we’ll just start seeing sub-U servers. First will be 0.5U servers, then 0.25U, and eventually 0.1U. By that point, you’ll be racking racks of servers, with 10 0.1U servers slotted into a frame that you mount in an open 1U slot.
Silliness aside, we’re kind of already doing that in some uses, only vertically. Multiple GPUs mounted vertically in an xU harness.
You’ve reinvented blade servers
The future is 12 years ago: HP Moonshot 1500
“The HP Moonshot 1500 System chassis is a proprietary 4.3U chassis that is pretty heavy: 180 lbs or 81.6 Kg. The chassis hosts 45 hot-pluggable Atom S1260 based server nodes”
That did not catch on. I had access to one and the use case and deployment docs were foggy at best
It made some sense before virtualization for job separation.
Then docker/k8s came along and nuked everything from orbit.
VMs were a thing in 2013.
Interestinly, Docker was released in March 2013. So it might have prevented a better company from trying the same thing.
Yes, but they weren’t as fast, vt-x and the like were still fairly new, and the VM stacks were kind of shit.
Yeah, docker is a shame, I wrote a thin stack on lxc, but BSD Jails are much nicer, if only they improved their deployment system
Agreed.
Highlighting how often software usability reduces adoption of good ideas.
The other use case was for hosting companies. They could sell “5 servers” to one customer and “10 servers” to another and have full CPU/memory isolation. I think that use case still exists and we see it used all over the place in public cloud hyperscalers.
Meltdown and Spectre vulnerabilities are a good argument for discrete servers like this. We’ll see if a new generation of CPUs will make this more worth it.
128-192 cores on a single epyc makes almost nothing worth it, the scaling is incredible.
Also, I happen to know they’re working on even more hardware isolation mechanisms, similar to sriov but more enforced.
128-192 cores on a single epyc makes almost nothing worth it, the scaling is incredible.
Sure, which is why we haven’t seen a huge adoption. However, in some cases it isn’t so much an issue of total compute power, its autonomy. If there’s a rogue process running on one of those 192 cores and it can end up accessing the memory in your space, its a problem. There are some regulatory rules I’ve run into that actually forbid company processes on shared CPU infrastructure.
There are, but at that point you’re probably buying big iron already, cost isn’t an issue.
Sun literally made their living from those applications for a long while.
Yeah, that’s the stuff.