"The internet is an enormous, ethereal place in a constant state of rot."
That's how Zachary Crockett of The Hustle starts his article on the Wayback Machine.
I remembered the Wayback Machine recently because -as always- it came in handy. I asked The League of Community Managers (a Facebook group) if they had any free examples of GDPR compliant T&C's. A helpful contributor pointed out that I could jump into my own emails and look at the torrent of compliance letters that businesses sent when GDPR kicked in. And if the pages had 404'd, I could just check the Wayback Machine to find the originals.
The Wayback Machine, you see, is a massive archive of the internet. You can type in any website and chances are you'll be able to see what it looked like during different eras (as far back as 1996).
Here's Google in 1998
MySpace in 2004...
...and New Grounds in 2001
The archive contains 40 petabytes -40 million gigabytes- of data. Now they don't archive everything... The internet grows at a rate of 70 terabytes -about 9 of the Internet Archives’ hard drives- per second. But the archive is still extensive.
As Crockett points out, there are ethical questions when you don't archive everything. While the archiving is mainly done by bots, humans decide when the bots should stop a "hop" and move onto another site. Do they have any biases?
The other question is, do you let people request that their page isn't archived? And do you retroactively delete things upon request? In the past, the Internet Archive (the NFP body that owns the Wayback Machine and other archives) would always comply. But according to The Hustle they have recently become more reluctant to delete anything. In the era of "post-truth" this NFP is increasingly aware of the danger of censorship and manipulation of the public dialogue. In fact they announced a plan to create a back up archive in Canada just in case things go south in the US.