Thursday, March 31, 2005

Archiving the Internet

I was channel surfing the other day and stumbled upon an intriguing program on one of the few stations that make television worth watching: C-SPAN. The program featured Brewster Kahle speaking at the John Kluge Center at the Library of Congress about his ambitious dream of cataloging the Internet.

Kahle sold his company, Alexa Internet Corp., which is the chief chronicler of Web traffic, to Amazon.com in June 1999, and is now hard at work on the Internet Archive, a project that is designed to save and sort the world’s collective knowledge. For more info, visit its site at http://www.archive.org/.

“The Internet has the potential to be the greatest library in the history of mankind—a repository of memory, thought, culture, and scholarship; a record of what it means to be human. But without an archive, it’s nothing more than a catalog of the perpetually changing now,” said Kahle.

I jotted down some of the more fascinating statistics that Kahle shared in the process of outlining the daunting task he faces. According to Kahle, there are about 28 million books in existence today, 2 to 3 million music recordings, 100,000 to 200,000 moving pictures, 50,000 software programs, 50 million Web sites and 40 billion Web pages.

Apparently unfazed by the job ahead, Kahle blithely stated that as big as the challenge is, it is nonetheless doable in our lifetime. As an example, he said that the equivalent of the entire collection of the Library of Congress could be archived at a cost of 200 million to 300 million dollars, a fraction of its current operating budget.