Google Partners with Oxford, Harvard & Others to Digitize Libraries

Date published 14 December 2004 Author

Gary Price

Categories

Industry

Google is working closely with five new content partners on a massive scanning project that will bring millions of volumes of printed books into the Google Print database.

Google is working with libraries at the University of Michigan, Harvard University, Stanford University, Oxford University and the New York Public Library to digitize books in their collections and make them accessible via Google Print.

Google Print was expanded in October and allows publishers to make scanned copies of books available through Google. (See: Google Print Opens Widely To Publishers)

At the University of Michigan, the plan is to scan seven million titles over a six year period using a non-destructive scanning technology that Google has developed. The university will also be given a copy of each file to use as they see fit. A “digitize the complete library” arrangement is also the current plan at Stanford and Oxford, and the New York Public Library will also be running a pilot project.

Harvard’s involvement in the program is a “pilot project” according to Peter Kosewski, director of Publications and Communications, Harvard University Libraries. For now, Harvard is allowing Google to digitize 40,000 titles. The university wants to use the project to learn about large scale digitization projects. The first set of materials will come from the Harvard Depository. The total size of the Harvard book collection is over 15 million volumes.

Google will begin the scanning process with a focus towards out-of-copyright content. Product Manager Adam Smith, said that many variables come into play regarding what order to scan including the way material might be shelved in these libraries.

Google stressed that today’s announcement simply introduces the partnerships. In fact, just a small number of scanned items from either library are currently available in the Google Print database.

Google has no plans to introduce a Google Print “only” search interface. Google Print results appear in the “OneBox” area at the top of Google search result pages, in much the same way that news headlines or products from Froogle appear in response to relevant queries. However, tools have been created to help isolate Google Print material.

Books that are scanned from either library’s collection will also have a direct link to find the book in a local library (along with links to purchase the book) using OCLC Open Worldcat data. Other books (materials not scanned from the library collections) will not have the “Find it in A Library” link available. Searching by subject (using a controlled vocabulary) is not available, at least at launch.

“In-copyright” books that are in these collections will have basic bibliographic information available but the full text will not be accessible.

Smith told us that out-of copyright material will be available in full text, though printing will be disabled when viewing this content.

All books will be scanned by Google, in many cases on-site. “Both parties will work conservatively within the laws of copyright,” Smith said.

Material is scanned into image files, though Google declined to discuss specific file or viewing formats. Google developed this scanning technology for the Google Catalogs project which has remained dormant for most of this year.

Although Google has no current plans to include material from other libraries, Smith said that the company would be happy to talk with libraries interested in potentially participating in the program.

This is a massive digitization project and it will be very interesting to monitor how the work progresses over the next year. It will also be interesting to see if other web search companies (Yahoo, MSN, Ask Jeeves) partner with libraries and repositories of printed content.

Other Sources For Full Text Books Online

Placing full text book material is not a new idea on the web. Many services, both free and fee-based, allow you to access books online. The longest running such service is Project Gutenberg, founded by Michael Hart in 1971, with over 13,000 books available.

I wrote about The Online Books Page forSearchDay last year. This wonderful collection has been online for more than 10 years, and currently provides searchable access to over 20,000 free full text books. The OBP is edited by John Mark Ockerbloom, a digital library planner at the University of Pennsylvania.

The Internet Archive is also digitizing books. The goal of the Million Book Project is to “create a free-to-read, searchable digital library the approximate size of the combined libraries at Carnegie Mellon University, and one much bigger than the holdings of any high school library.”

One publisher that offers a large portion of their new and old material available online, free, searchable, and full image is The National Academy Press. The currently offer access to more than 3000 publications.

Two fee-based services include NetLibrary offers access to about 76,000 books with about 1300 new titles added each month. You can access NetLibray books through your local public or university library, often at no charge.

ebrary provides access to more than 50,000 titles (books, maps, sheet music, etc). Like NetLibrary, ebrary licenses their service to libraries and educational organizations and users can login and access via any computer with web access, in most cases for free.