TIL how big the Internet Archives are
POSTED ON:
TAGS: internet
The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form.
They were incredibly helpful as a resource for my presentation about a obscure 2000s game NetMonster, as I was literally digging through 2000's internet and pouring over Geocities source code.
It took me to some interesting rabbit holes
How does data get stored? #
IA does everything in-house rather than having its storage and processing hosted by, for example, AWS. His answer: lower cost, greater control, and greater confidence that their users are not being tracked.
via https://news.ycombinator.com/item?id=26312389
How much storage is used #
(1 Petabyte === 1024 Terabytes)
2010: 5.8 Petabytes src
2014: 50 Petabytes src
2020: 70 Petabytes src
These numbers look off.
Via Jonah Edwards, who is part of the Internet Archive team:
As of 2021:
-
They can store almost 200 PB raw.
-
They grow by about 5-6 PB per quarter (10-12 PB raw).
-
Which is 20-30 PB per year (40-60 PB raw)
Jonah Edwards - Internet Archive Infrastructure
https://archive.org/details/jonah-edwards-presentation
The 'Petabox': #
https://archive.org/web/petabox.php
Related TILs
Tagged: internet