Internet Archives

The Internet is constantly changing and apparently ephemeral. It can give users the impression that they can write whatever they want and it will disappear as soon as they forget about it. Not so – different archives for different purposes exist. You should know about Internet archives, where they are and why they were set up. They can be useful for looking up old material. But they can also stir up old trouble for you.

The most extensive Internet archive is the Internet Archive Wayback Machine. Many people don’t know about it because it doesn’t advertise all over the place. A non-profit Internet library founded by Silicon Valley millionaire Brewster Kahle in 1996, the Internet Archive’s purpose is to record and archive for posterity every webpage (and their contents) out there in Cyberland. Kahle’s Internet Archive has collected copies of tens of billions of pages in the past ten years.

A major advantage of the Internet Archive is that it’s so inclusive. If you want to locate and read a defunct site or establish a beginning date for your own site (especially useful if someone is violating your copyright by plagiarizing you on their page), you can look it up on the Wayback Machine on the Internet Archive’s main page. All you need is the URL.

Another major Internet archive is Google Groups. Google Groups is the former Deja News library of Usenet group messages. Deja News first began in 1995 and went under in 2000. Deja News’ Usenet archive was sold to eBay and then to Google, which continues to maintain it for those accessing Usenet. This Internet archive holds them all, millions of messages from tens of thousands of Usenet groups like alt.spanking going back to the early 1990s.

Another large and popular Internet archive with a more recent list of discussion threads, Yahoo Groups, is an archive of self-contained listservs. Yahoo’s groups not only archive every message, including those of groups that switched from another service to Yahoo, but unless the moderator or owner makes those messages private, they are searchable on the Internet.

You can also look up old versions of webpages in a more ephemeral and scattershot Internet archive: search engine caches, most notably those in Google and Yahoo. These are temporary archives of webpages that you can access in case the current version is down. This type of Internet archive can last for months after a webpage goes down for good. You can find the link “cache” right after the URL of the search hit to access it. Web caches are especially useful for finding a blog entry that has been moved or even deleted. You can also use them to find a webpage that has recently been erased, though the webmaster may take steps to block a cache copy or have it erased afterward. Also, very new webpages won’t have copies until the webspiders have crawled them and recorded them for the Internet archive. This can take up to a week or so.

Another type of Internet archive is the private or subject-related database like Internet Movie Database (IMDB), especially one with long-term and extensive discussion boards. But these may or may not turn out to be ephemeral, since their mission statements are based on business concerns, not an intent to archive information for posterity. Wikipedia is another type of Internet archive – the encyclopedia that seeks to inform and/or educate. Associated Content, a collection of articles on consumer subjects, is a similar type of Internet archive to Wikipedia.

There are always gaps in an Internet archive. People or groups can have their messages or pages deleted, though this is not that easy. Also password-protected pages (like Yahoo groups with private message archives) are not recorded by long-term projects like Kahle’s Internet Archive. These types of pages will remain poorly represented in the historical record that Kahle’s group hopes to leave to the future. Meanwhile, administrators of non-Usenet groups, lists and blogs can delete any messages they choose from whatever Internet archive they control. This is a far more common practice on public bulletin boards (as at IMDB), or blogs where the illusion of anonymity is thin, than on private groups where administrators might be more interested in keeping a compete record of messages for their members.

Obviously, all of this archiving raises some concerns in terms of security and personal privacy. If something you said when you were an anarchist twenty-year-old is still up when you’re forty and working for a corporate bigwig, this can throw a monkey wrench in your career plans. It is possible to have something erased from an Internet archive, but it can be difficult and you have to know where you posted something in the first place – and who has the page now.

But these concerns about Internet archives are also based on a fallacy: that the Internet ever promised anything but the random and indiscriminate spread of information pretty much anywhere for as long as the ‘Net lasted. The Internet never really was private. People sitting in front of their screens at home invented the illusion of privacy in their own minds. Cyberspace doesn’t work that way. It’s not so much like chatting in your house as standing, whispering, in a marketplace full of microphones. And you can’t take it back because you did say it. So, be careful what you whisper; in an Internet archive, it could echo for a long time.

Leave a Reply

Your email address will not be published. Required fields are marked *

eight − = 3