Over the past couple of weeks, The Internet Archive has already been uploading content behind the scenes, and today we are very excited to officially launch them into The Commons.
The Internet Archive is best known for its historical library of the web, preserving more than 400 billion web pages dating back to 1996. Yet, its 19 petabytes include more than 600 million pages of digitized texts dating back more than 500 years. What would it look like if those 600 million pages could be “read” completely differently? What if every illustration, drawing, chart, map, or photograph became an entry point, allowing one to navigate the world’s books not as paragraphs of text, but as a visual tapestry of our lives? How would we learn and explore knowledge differently? Those were the questions that launched a project to catalog the imagery of half a millennium of books.
A Yahoo research fellow at Georgetown University, Kalev Leetaru, extracted over 14 million images from 2 million Internet Archive public domain eBooks that span over 5 centuries of content, compiling more than 14 million high resolution images spanning nearly every topic imaginable. Each image includes detailed descriptions, including the subject tags of the book it came from and the text immediately surrounding it on the page. The latter is especially powerful, as it allows to keyword search 500 years of images, instantly accessing particular topics or themes. Searching for love yields a myriad images of cherubs and courtship, while mortis (death) offers a glimpse into the early modern period’s fascination with the subject. A search for bird offers a vividly colorful showcase of the world’s bird species, while searching for telephone traces the invention’s history from its introduction as an electric novelty to its widespread adoption.
Perhaps what is most remarkable about this collection is that these images come not from some newly-unearthed archive being seen for the first time, but rather from the books that we have been digitizing for years that have been resting in our digital libraries. Through the power of big data we are suddenly able to view the world’s books not as merely piles of text, but as individualized galleries of one of the richest and most diverse museums of imagery in the world.
The Internet Archive’s team hopes that this project inspires us all to reconnect with our cultural past and that you will join this exciting journey to unlock the visual tapestry of the world’s books. Check back regularly as more of the 14 million images will be uploaded to Flickr over the coming months!