The Internet Archive is even more essential than I realized

It never dawned on me that I might be insufficiently grateful for the Internet Archive. For many years, I’ve used it on something close to a daily basis, including to research numerous articles I couldn’t have written otherwise. I’ve uploaded some of my own work to it, shipped off my grandfather’s scholarly library for scanning, and donated money (though not enough).

But over theplast week, I’ve grown even more appreciative of the Archive, for the worst possible reasons. On October 9, the site suffered a DDoS attack that turned out to be linked to a full-on data breach in which hackers stole and leaked the account information of a reported 31 million users. It then went offline to be hardened against further such attacks, a process founder Brewster Kahle said should take “days, not weeks.” As I’m writing this, only the Wayback Machine is back, in read-only form.

All in all, it’s been an annus horribilis for this unique free repository of human knowledge and creativity. Even before the recent assault, it suffered an earlier DDoS attack in May; and in September, it lost its appeal in a court case brought by major publishers over its lending library of scanned e-books, which had already resulted in it delisting 500,000 titles. It’s still fighting a different case involving its collection of digitized 78-rpm records.

In other words, the year has been full of reminders of how fragile the Internet Archive is as an institution. But doing without it altogether—just for a few days—has focused my attention on how much we need it.

First, there’s the Wayback Machine. We’re now 30 years into the web age, and the percentage of public discourse that takes place digitally rather than in printed form only continues to grow. Yet instead of growing more dedicated to preserving their archives of past content, many publishers seem to have given up on the whole notion. In May, Hunter Schwarz reported on a Pew Research Center study which found that 38% of web links from 2013 no longer work. Articles and videos that would be invaluable for research purposes have often vanished: I’m not sure if anything I wrote while on staff at PC World, where I worked from 1994-2008, is still on its site. In these most challenging of times for the media business, entire news sites are going poof and taking their archives with them.

When older webpages have managed to stick around, they often suffer from severe formatting issues and have lost some or all of their media. They can also attain a sort of phantom state in which they’re difficult to track down unless you already know they’re there. For example, CNN.com still has some historically significant articles up from the 1990s, but as far as I can tell, you can’t find them with its own search engine.

There are many explanations for this sorry state of affairs. Preserving content created in one publishing system once you’ve moved to another is a hassle. So is maintaining the same format for URLs through the years. And some publishers have grown concerned that they might not have provable legal rights to keep on publishing every word and image they’ve ever posted. But all these issues could be overcome if companies saw money to be made in keeping everything available forever. Sadly, they usually don’t.

(Full disclosure: FastCompany.com has what is, as far as I can tell, a reasonably comprehensive archive of our stuff going all the way back to our premiere issue in 1995. Yes, some of it has fallen victim to formatting quirks, but I’m glad it’s survived.)

As wide swaths of the web have rotted away, the fact that the nonprofit Internet Archive has been storing pages since 1996 and making them available via the Wayback Machine since 2001 has grown only more important. Even Google has discontinued its venerable cache of webpages—and replaced it with links to the Wayback Machine.

Then there’s the rest of the Internet Archive, a vast library of documents, video, audio, and software representing not just the past 28 years, but all human history. The Archive isn’t the only institution doing some of this work: For instance, HathiTrust is a fine free e-library you might be able to access if you have an affiliation with a college or university, including just being an alumnus of one. But nobody else has ever tried to do it all, all in one place.

For-profit businesses do, of course, see value in older books, movies, and music. That’s why some of them have sued the Internet Archive over its offerings. But there are enormous amounts of material that they’ll never bother to make available. Often, they aren’t even great stewards of the content they do have: Amazon’s Kindle store, the closest thing we have to a comprehensive collection of for-pay e-books, has become so polluted with AI-generated spam that browsing it gives me a headache.

It’s the items that would otherwise be unobtainable that make the Archive essential. I regularly use it to pore over computer magazines from the 1970s and 1980s. It has a novel written by a distant cousin of mine that must have gone out of print shortly after being released in 1949. Last week, shortly before the breach, I looked up something in a 1973 London telephone book. Several times during the outage, I found myself instinctively going there to check something despite being aware the site is down.

Good physical libraries see obscurity not as an excuse for ignoring a work, but as an argument for collecting it and ensuring it remains available when needed. So does the Internet Archive. The difference is that it will never run out of space. Like Wikipedia—maybe its only peer among online institutions—it’s a public good on a scale that could only exist in the digital age. And it exists only because Brewster Kahle thought it should—and because an enormous number of people have contributed to making it the astonishing reality that it is.

You’ve been reading Plugged In, Fast Company’s weekly tech newsletter from me, global technology editor Harry McCracken. If a friend or colleague forwarded this edition to you—or if you’re reading it on FastCompany.com—you can check out previous issues and sign up to get it yourself every Wednesday morning. I love hearing from you: Ping me at [email protected] with your feedback and ideas for future newsletters.

No comments

Read more