Agreed, I think the biggest issue though is just scale. It’s over 100 petabytes of data. Not outside the realm of big cloud providers to mirror, but they don’t really give a shit. It would require some sort of significant distributed software solution for the community to work with. Not impossible, but as far as I know, nobody’s taken up the mantle yet as I think it would need custom software just to begin the solution of how to distribute it as a sharded set of community mirrors, different people just mirroring individual pieces.
Looking at diskprices.com, lowest prices for storage are around $8 (used) or $15 (new). I didn’t look too hard, but a 30TB SSD for $80 (~$2.5/TB) seems wrong?
100K TB * $15/TB = $1.5 million
Assuming 100PB is the amount of data, we’d also need redundancy. Idk what best practices would be, but I’ll say 3ish copies, so 300PB total.
So a grand total of ~$5 million.
Which is crazy cheap, all things considered. Like, it would be no problem for a single rich person to handle that.
Hell, subsidize/give away cheap little computers that you just plug power and an Ethernet cable into. Raspberry pi + 4TB drive ($60) + casing would be like… $100? Though I guess you’d need 75K of them, and the cost per TB is pretty bad.
If we stick with $40 of overhead for rpi etc, that’s $320 for 20TB ($16/TB), and we’d need 300PB/(20TB/unit) = 15K units. And at $320 each, all in would be $4.8 million.
The software seems to exist for connecting them all… So idk seems like it would be absolutely feasible? Would be interested to learn if I’m missing a major cost.
HexOS has a plan for shared encrypted data. With the simplicity of installation and management it could take off mainstream as personal NAS are gaining popularity, but its still in early development.
IPFS is the way to go IMO, it’s so perfect for archival that it pains me that it’s still pretty unknown
the fact that you don’t need any sort of central organization for everyone to help seed data is amazing, no more duplicate torrents splitting seeders, so long as you have identical data the network just figures it out.
If you have the hash for a piece of data you can just set a computer to watch for someone to start seeding it, even if the last time anyone saw the data was decades ago and a dude just found a CD in their recently passed dad’s basement, if that dude seeds it overnight and then their computer explodes, you’ve now downloaded it and it’ll remain available. It’s so fucking good.
We are screwed if the Internet Archive goes down, right?
Seems like a huge point of failure for one entity.
Agreed, I think the biggest issue though is just scale. It’s over 100 petabytes of data. Not outside the realm of big cloud providers to mirror, but they don’t really give a shit. It would require some sort of significant distributed software solution for the community to work with. Not impossible, but as far as I know, nobody’s taken up the mantle yet as I think it would need custom software just to begin the solution of how to distribute it as a sharded set of community mirrors, different people just mirroring individual pieces.
So about 104,857,600 GB? You’d need 105,000 people with 1 TB each to save that. Or…
Assuming you bought 30 TB SSDs, you’d need about 3,500 of those, costing €80 each.
That’d be €280k, but let’s round it to €300k.
If every person spent €960 (or €80 per month), then each person could get 12 of those SSDs. You’d need 8,750 people to do that.
Should be doable if crowdfunded by a community, or if you had some big donor. Then you’d need to connect it.
Looking at diskprices.com, lowest prices for storage are around $8 (used) or $15 (new). I didn’t look too hard, but a 30TB SSD for $80 (~$2.5/TB) seems wrong?
100K TB * $15/TB = $1.5 million
Assuming 100PB is the amount of data, we’d also need redundancy. Idk what best practices would be, but I’ll say 3ish copies, so 300PB total.
So a grand total of ~$5 million.
Which is crazy cheap, all things considered. Like, it would be no problem for a single rich person to handle that.
Hell, subsidize/give away cheap little computers that you just plug power and an Ethernet cable into. Raspberry pi + 4TB drive ($60) + casing would be like… $100? Though I guess you’d need 75K of them, and the cost per TB is pretty bad.
This guy is 20TB for $280: https://a.co/d/17UOtFi
If we stick with $40 of overhead for rpi etc, that’s $320 for 20TB ($16/TB), and we’d need 300PB/(20TB/unit) = 15K units. And at $320 each, all in would be $4.8 million.
The software seems to exist for connecting them all… So idk seems like it would be absolutely feasible? Would be interested to learn if I’m missing a major cost.
HexOS has a plan for shared encrypted data. With the simplicity of installation and management it could take off mainstream as personal NAS are gaining popularity, but its still in early development.
Interplanetary File System can do it
IPFS, GnuNet?
IPFS is the way to go IMO, it’s so perfect for archival that it pains me that it’s still pretty unknown
the fact that you don’t need any sort of central organization for everyone to help seed data is amazing, no more duplicate torrents splitting seeders, so long as you have identical data the network just figures it out.
If you have the hash for a piece of data you can just set a computer to watch for someone to start seeding it, even if the last time anyone saw the data was decades ago and a dude just found a CD in their recently passed dad’s basement, if that dude seeds it overnight and then their computer explodes, you’ve now downloaded it and it’ll remain available. It’s so fucking good.