The Engines win by a landslide, at least according to current college students.
They preferred searching on Google or Yahoo versus their college library systems, based on the attributes of: speed (90%); convenience (84%); ease of use (87%); cost-effectiveness (71%); and reliability (63%). Libraries, however, won on more trustworthy measures including credibility (77%) and accuracy (76%).
While students prefer library sources, they also heavily count on the engines. Over half (53%) say the results from engines are as trustworthy as libraries. Google, Yahoo and Ask all rank about the same, without much differentiation.
And, as for those people sitting behind the library desk, here's your wake-up call. It turns out that over two-thirds (67%) of students believe that librarians performed either the same as or worse than the engines. Even though librarians were valued and considered helpful, they apparently don't compare to indices and algorithms.
Interestingly, the survey sponsor is OCLC, a library services organization best known for its worldwide catalog which helps libraries make their holdings more searchable and available to patrons. They contacted several hundred students last year to determine their views on libraries, and recently made the results accessible online.
Of course, the major engines already acknowledge the importance of libraries and their holdings. We see this playing out in myriad initiatives underway, ranging from Google's Scholar and Books efforts to Microsoft's think-tank gatherings.
Libraries are still filled with treasure troves, holding everything from special collections to rich databases. Years ago, librarians made progress in providing electronic islands for their patrons. Now, their challenge is to make the holdings as searchable as possible -- following their "self-service" patrons into the larger search ecosystems.
Posted by at 3:12 PM | Permalink
So who is quietly trying to solve your search and discovery problem? Librarians. This week, a new searching mechanism was announced by the OpenLibrary project, with the audacious goal of providing information about every book on the planet. No ordinary catalog here, as OpenLibrary relies on the considered librarianship of everyone who uses or contributes to it.
As usual, librarians are experimenting with access, resources and usability. We're happy to follow their lead. In this case, it's digital librarian and archivist Brewster Kahle, who started the Wayback Machine and has been thinking about open access for years. Yet almost no one heard about this effort, and it's pretty interesting!
Lately, we have seen some mainstream publicity about librarians and their quiet influence. The NYTimes has focused on them in recent weeks -- from announcing the impending death of Dewey Decimal to declaring librarians as hipsters. Say what you will about this recent spate of publicity, but these book lovers are notoriously bad at marketing themselves. There are some web sites which aim to help, however, such as Librarian and I Love Libraries.
Librarians have always represented the “uber class” of searchers. They may not tout their achievements as prominently as the pure tech crowd, but they have been pushing web access since before it was even remotely hip. Looking for recommendations, links or more? Your local librarian has probably spent more time unearthing truly meaningful resources than the average techie. It's too bad that we can't bottle their vast experience and create the most expert results out there -- or maybe there is a way of tapping into their search sorcery.
Ask any student working through a term paper and you'll find that librarians are a welcome influence when the going gets tough. For those of you who graduated over a decade ago, you'd be amazed at how academic and public libraries have transformed into electronic wonders, open 24/7. Too bad we are blocked from so many of these restricted or deep-web resources.
Where can outsiders see the good stuff? That's always more challenging. It's often helpful to follow the trails behind librarians. We're familiar with Gary Price, from Ask, who also co-manages the excellent ResourceShelf site. Another influential librarian in our search world is Mary Ellen Bates, who offers tips to help her fellow info pros get their bearings. We're also interested in whatever the American Library Association (ALA) provides, like these monthly recommendations.
Whether individually or collectively, these expert librarians are trying to be helpful. We wish they would speak a little louder -- and be heard amidst the overall search cacophony.
Posted by at 3:25 AM | Permalink
University of Wisconsin-Madison is the next university to join Google's Book Search Project. The University has one of the largest collections of historical documents and books in the US, accounting for about 7.2 million holdings. The University houses the famous Wisconsin Historical Society Library which is also part of this project. The University of Wisconsin-Madison has their release here and Google has their release here and Reuters has their article here.
Posted by Barry Schwartz at 8:35 AM | Permalink
Google's blogging (and here) about how it is supporting the 25th anniversary of the American Library Association's Banned Book Week by posting information about novels that have been challenged or banned from being in libraries within the United States. That's great, but it also rings hollow given Google's support of wide-spread censorship in China.
Consider some of the quotes the ALA has put out to promote its anti-banning campaign:
"Libraries should challenge censorship in the fulfillment of their responsibility to provide information and enlightenment." —Library Bill of Rights
"We uphold the principles of intellectual freedom and resist all efforts to censor library resources." —ALA Code of Ethics
Google's a library resource, make no doubt about it. Pick a librarian, and they'll tell you Google is a key resource they use. Not the only resource, but an important one -- and one that I doubt they feel should be censored to the degree that Google does in China.
Back to Google's support of Banned Book Week, its new Explore Banned Books page has links to information about 42 classic books that have been banned or challenged over time. Here's a recent article on banned books in China.
Posted by Danny Sullivan at 6:36 AM | Permalink
If it's September it must be Shakespeare. Clusty has released Shakespeare Searched which is designed to provide quick access to the works of the Bard. It's not designed as a replacement for, or access to the full text of his work, but as a quick reference resource. The concept is that it can be used to identify who made a specific speech, which work contains which quotes or even individual words, and also helps draw out specific themes in individual works or across the entire corpus.
It doesn't provide analysis or commentary, just direct access to the text via Vivisimo. Consequently it's useful for teachers who can use it to create lesson plans, and it's helpful for students not only as a quick reference guide but also, because of the clustering aspect, as a means of suggesting ideas for topic papers.
As you would expect, the strength of the resource is in the underlying approach that Clusty uses to return results. A search for 'isle' for example returns 39 results. The main body of the screen provides access to the text in which the keyword or phrase is used, the play/act/scene that it is from, and the speaking character. There is an option to additionally display surrounding text, but this is usually limited to the previous line of text from the last character to speak. However, since the surrounding text is already quite generous this isn't too much of an issue.
The real power of the resource however lies in the clustering. Again, to use my 'isle' search topics such as King, God, and Warlike Isle are displayed. Clicking on the latter of these the searcher is rewarded with 3 results that put that phrase into context. Somewhat disappointingly the results do not appear to be returned in any obvious fashion, as we get a reference to Othello Act 2 Scene 3, then Henry VI Part 2 Act 1 Scene 1 and then back to Othello Act 2 Scene 1. This leads to a rather confusing display and slightly mitigates against its value as a quick reference tool.
However, the clustering approach does not simply stop at concepts or topics. A second tab allows the searcher to view references to the search term by play (arranged alphabetically by title) which does overcome some of my earlier criticism. A final tab allows me to see which characters have uttered the word 'isle' and to pull up the appropriate part of the text.
The search interface also allows users to search for terms in individual works (and Clusty also includes the Sonnets) so it's easy to quickly identify the John of Gaunt speech in Richard II that refers to "isle". There are two pull down menus for works and for characters. The menu for the works is obvious, but it's the menu for characters that is remarkable, since it lists every single character in every single play, and believe me - there are a lot of them. Irritatingly though we have to guess which play the characters are from, which isn't so bad in the case of Hamlet, but it's going to take a skilled reader to identify which play "First Goth" or "Second Page" come from! (Titus Andronicus and Ask you like it in case you're wondering.)
All's well that ends well, although not quite - there are some drawbacks to the service which are slightly irritating. The search interface is very precise. My search for "sceptered isle" produced no results, and neither did "scepterd isle" - but "scepter'd isle" did get the result I was expecting. A search for '"feared" gives 3 results, while "fear'd" returns 46 results. While this is a small point I think that it's a very important one since Shakespearian English is not always as obvious as we might hope. I would have preferred to have seen a reference to 'did you mean?', or an option to automatically word stem, or a clustering of similar words.
Those criticisms apart however, it's an extremely useful resource, and one that should prove to be instantly popular. However, given that there are a lot of other Shakespearian resources out there, how does it rank with some of those? It would of course be unfair to try a direct comparison, since they are all trying to do something slightly different, but several of them do offer the ability to search within the confines of the texts themselves.
Explore Shakespeare with Google offers us the complete plays at our fingertips with the option of downloading many of them (although this depends on where you're geographically located) or purchasing them. It is possible to search within individual plays, but it's first necessary to know which play you want - a limitation that doesn't exist with the Clusty version. However, the information provided by Google is limited to no more than a few words either search of the search word, so it is necessary to follow the link to read directly from the page. Google's approach, while useful in some circumstances, does not provide the breadth or flexibility of use that we find with the Clusty offering.
The Library of the Internet Shakespeare Editions is rather different again, focusing on reviews and academic works about the plays. It does have a search option for the text, although this hasn't worked when I've looked at the site myself so I'm rather limited in my ability to compare resources!
The complete works are also available thanks to MIT but there is no real pretence at providing any kind of search functionality; the only way this would be possible would be to view the entire play on one page and use a search function from within your browser to find the text required.
The Oxford Shakespeare provides searchers with the 1914 Oxford edition, which is searchable. My "isle" search did only return 27 results however, rather less than Clusty. Confusingly the results are arranged in the order of Act, Scene, and then the name of the play, with a reasonable amount of surrounding text. My "sceptered isle" query also caused problems, with entirely different results for variations of spelling. While being a useful resource overall, in terms of searchability and flexibility I would have to say that the Clusty offering wins hands down.
The Collected Works of Shakespeare is also available in a rather less polished format from a student at the School of Information Technologies, University of Sydney. This time my search produced 37 results and the context within which the term could be found was displayed on screen in a basic format. Interestingly however, it was something of a pleasant delight to discover that this resource was capable of proximity searching (and it was possible to specify the degree of proximity as well). Searches could also be limited to specific plays as well. This was probably the closest in terms of functionality to the Clusty resource, but again having a different focus, so direct comparison would be unfair.
There are of course other resources available, and this isn't intended to be an exhaustive summary of them; merely to put Clusty's 'Shakespeare Searched' into some context. None of them were able to exhibit the same flexibility and functionality, and with the arrival of this resource I think that learning Shakespeare just got a whole lot easier.
Posted by Phil Bradley at 9:37 AM | Permalink
Google is now offering free, downloadable versions of public domain books that you can find in Google Book search. Unfortunately, there's no way to browse through a directory of books that are available. However, you can keyword search for them easily, sort of.
On the Google Book Search home page, you'll see two options below the search box. By default, "All books" is filled in. Change this to "Full view books." Now search for something you are interested in, Dante's Inferno, an example both the AP and Bloomberg cite in stories about the new feature.
When the results appear, click on one of the books that comes up, such as this one. Over on the right-hand side, in the column just below the title, you should see a "Download" button. That will let you download the book in PDF format.
Here's the problem. Some Full View books are full view for reading online but not for download, and Google doesn't make it easy to narrow in on only the downloadable ones. For example, here's a search for mars. Here's the first book listed. No downloadable option is offered.
Since Google seems to be making downloadable versions of anything that's out of copyright -- and since those are books published before 1923 in the US -- you might try an advanced search for books before that date.
For instance, here's a search for all books about Mars for the years 0 through 1922 (FYI, I did try to search for books older than 0 AD, but the system doesn't support negative/BC dates, as far as I can tell).
Overall, this is a nice feature to have. Next time I'm heading on a trip, I might try downloading some PDF books to read for the journey. But that leads me to improvements I want to make it easier to find good books:
Looking for more info? From Google's help pages, Why is the Download button only available on some books? explains that only public domain books get the download option right now, and even some of those have yet to be enabled.
How can I find books that I can download? explains briefly what I've already covered in more depth above, but that might expand over time.
What is a public domain work? explains what books Google considers to be in the public domain.
Google's not the only place offering electronic, downloadable books. Project Gutenberg is probably the best known long-standing site already doing this, and you can see some of the top titles here, similar to what I hope Google will do in the future.
That's also a handy way to see if Google offers some of the most popular titles that Project Gutenberg does. So far, the answer is no. I took this top ten list from Project Gutenberg:
I did quick searches using the Full View option along with the titles and author names. I couldn't find any of them available at Google for download.
Gary Price has also written about ebrary in the past that offers books for purchase and, I believe, some limited downloads for free. Last month, he also wrote of the World eBook Fair making 300,000 titles available for download. World eBook Library still offers links to these works, but you have to pay $9 per year for access to them. Wow, look at all those sci-fi books from Baen, including the alternative history work 1633 (I thought 1632 was great, 1633 OK and 1634, ugh!).
Watch ResourceShelf, as Gary's sure to post on alternatives to find downloadable books when he gets going later today. In the meantime, back when he was with us, his More Sources For Ebooks & Electronic Text post has a lot of resources you'll still find useful.
Google's also still apparently pondering sales of in-copyright works, with publisher permission, something that was floated by the company as an idea earlier this year. Amazon's also got similar plans in the works, but I don't recall seeing that having launched yet, despite its announcement nearly a year ago.
Finally, Google's also just offered a way for anyone to put a Google Book Search box on their site. Now you can add Google Book Search to your site from the official Google Book Search blog has more, as does this instruction page.
Postscript: For more resources on downloading books, see Gary Price's story here.
Posted by Danny Sullivan at 9:07 AM | Permalink
Steve Bryant reports that "publishers fight back against Google," with their own book search service. The new service is named LibreDigital Warehouse and was announced by HarperCollins and LibreDigital the other day. This new service will give "publishers and booksellers the ability to deliver searchable book content on their own Web sites." The technology empowers publishers to define rules on a partner and book title level, defining which pages are viewable, which pages are not, and what percentage of the pages are available. They will begin offer about 200 HarperCollins titles and increase that to 10,000 titles or so. More details on the service at eWeek.com.
Posted by Barry Schwartz at 8:09 AM | Permalink
UC May Join Google's Library Project from the Los Angeles Times covers news that Google may enter into an agreement with the University of California to scan library content for the Google Book Search service.
Posted by Danny Sullivan at 10:58 PM | Permalink
The Google blog 'Inside Google Book Search' announced in No holds bard that it is now possible to explore Shakespeare with Google - The complete plays of Shakespeare now at your fingertips. Well no, not exactly. I've spent some time playing around with this resource and it's less than impressive for a number of reasons.
I decided to take a look at the full text of a couple of plays, but in common with Philipp Lenssen found that I couldn't actually see the full text. All that I got was a fairly brief page with some bibliographic data, an opportunity to buy the book and links to related information. I went through each section in turn and found that in total I could read 13 of the plays Google listed, but was unable to do so for another 24. This may be in part due to the fact that I'm in the UK, and as the Google blog comments in an update some versions of the plays are not in the public domain everywhere in the world, so we can only see snippets.
I simply do not believe that Google could not have found versions of the plays that are out of copyright, particularly as they are keen for us to have the complete plays at our fingertips. However, I'll let that pass. What I really find unforgivable is their section 'Other ways to explore Shakespeare'. This gives me options to look for more resources, take a scholarly perspective, connect with enthusiasts and so on. Clicking on any of these links runs a default search for 'shakespeare'. Consequently with most of these options I get a huge number of results, many of them inappropriate. A search just on 'shakespeare' is the kind of basic search that I'd expect a school child to do once. I find it amazing that someone at Google could not have come up with rather more interesting and complex searches to fully utilise the power of the search engine, not only to give us a good search result, but also to show us just what it can do.
The concept is a great one; full marks to Google for having a go at it. The result is very much less than perfect, and for Google to say that they're making Shakespeare more accessible is in my opinion boardering on disingenuous.
Posted by Phil Bradley at 11:02 PM | Permalink
Microsoft announced that the University of California and University of Toronto Libraries will be participating in the Windows Live Book Search program. Both the universities will be digitizing "primarily out-of-copyright books" for Microsoft. In addition, Microsoft plans on making it easier for publishers to submit content for inclusion in the Windows Live Book Search index. http://publisher.live.com/ will be expanded within a few weeks to accept submissions in both digital and printed form.
Posted by Barry Schwartz at 11:13 AM | Permalink
Gary Price has a detailed & step-by-step write up on the new Amazon Online Reader. You can view the new look for the reader by clicking here. The new features include; search for words within the pages, scroll from page to page (looks AJAX like), and a zoom feature. More details at Gary and/or at Amazon.
Posted by Barry Schwartz at 9:41 AM | Permalink
Most of the information created prior to the advent of the web is still in printed form. Fortunately, a number of groups and companies around the world are busily scanning books, magazines and other printed content so that it can be accessed with search engines. Once this content is online, the possibilities for using and mashing up the content with other types of information are just about limitless—but current copyright laws may throw a huge wrench in the whole works. In today's SearchDay article, Building the Universal Library, I've done a "review" of a terrific new essay from Wired Magazine's "maverick editor" Kevin Kelly, which takes an extended, thoughtful look at many of the issues surrounding the creation and use of a Universal Library.
Posted by Chris Sherman at 8:32 AM | Permalink
Elinor Mills reports at News.com that Congoo released a search service enabling people to search newspapers, magazines and other periodicals that are typically fee-based subscription content. Currently, you need to run Internet Explorer on Windows for the downloadable toolbar to work, Firefox support is reportedly coming soon. The service gives you between 4 and 15 articles per month per publisher. If I may note that Gary Price blogged that this data and much more of it, is available for free via your library Web sites. For more information on accessing this information, read Gary's article named Finding Answers Beyond Web Search.
Posted by Barry Schwartz at 2:53 PM | Permalink
Google gave me the heads-up late Friday that a new feature allowing publishers to sell online versions of their books through Google Book Search was about to go live. Nothing was yet online when we talked, but that's since changed. A new help page, What does it mean to sell online access to my book?, explains that the program is the first in a series of revenue tools being rolled out for publishers.
The experiment will allow publishers to sell access to their books online, something Google hinted was coming back in November and January. Publishers set a price, then consumers can buy and read the book online. At the moment, the program supposedly will not allow copies of the book to be saved to a computer or pages to be printed ("copy pages") to be made. We'll see. So far, Google's existing protections limiting what users can see from books online have not been cracked, to my knowledge.
The program does not allow anyone but publishers in the US and UK who are voluntarily in the Google Books Partner Program to sell books online. Google also has book content that comes from its library scanning program. These books are not being sold. It bears stressing:
GOOGLE IS NOT SELLING BOOKS THAT IT HAS SCANNED FROM THE COMPLETELY SEPARATE GOOGLE LIBRARY PROGRAM.
Despite the capital letters and bolding, expect that many will begin saying that Google is now illegally selling books that it has scanned from libraries, just as many incorrectly say Google is reprinting scanned books online (they aren't). Some will do this out of misunderstanding. Some opposed to the library scanning program will do it on purpose, just to continue muddying the waters. To understand the myths and realities, please consider reading these past posts from me:
Want to start buying books now? Hang in there. Google told me that first they're getting publishers up and running, then at some undetermined point in the future, books available for sale will be offered.
John Battelle's got a very short note on the new page being up over here. As a sidenote, be sure to check out John Battelle's The Search and Google Book Search that John pointed at last week. In the interview, you can see how his publisher Penguin won't let books go into Google Book Search despite John wanting to be there. I like this part of the copyright page in his book:
The scanning, uploading, and distribution of this book via the Internet or via any other means without the permission of the publisher is illegal and punishable by law.
Wow -- I didn't know the lawsuit over the library scanning program had been settled. Certainly it's fair use for anyone to copy, scan and do other things with pieces of the book without permission, depending on the various circumstances involved. Whether the entire book can be scanned for indexing purposes, rather than reprinting online, is what the lawsuit that Penguin and others are trying to discover.
By the way, distribution of the book "via any other means" is also mentioned. I wonder if every library that has a copy of the book got the publisher's permission to redistribute to their patrons. I'm guessing not. Expect libraries to be sued shortly.
Back to book selling, the Unofficial Google Weblog points to Google Offers Online Pay Plan from Publishers Weekly, which quotes Google talking about "perpetual access" to the books you buy, plus Google saying publishers will get the majority of the price charged, though Google itself will of course also cash in.
Threadwatch points at this blog post, which covers how publishers in the partner program got the news via email and citing one anonymously who is surprised/shocked that publishers are being asked to enroll without yet seeing what the money split will be exactly. Good point.
Information Week has news of publishers Taylor & Francis and Brill already signed up to sell through the service.
For more on Google Book Search, please see our Google: Book Search & Library category, if you are a Search Engine Watch member (and thank you, if so). You might also check out A Look Back as Google's Library Project Passes the One Year Mark.
Want to comment or discuss? Please visit our Google To Sell Online Books thread at our Search Engine Watch Forums.
Posted by Danny Sullivan at 9:53 AM | Permalink
So, what's next Google service going to be? According to this BBC News story, it just might be an online bookstore for ebooks. This idea was floated past reporters and other invited guests in a post keynote backstage press conference last Friday night.
From the article: Google has suggested it may consider setting up an online book store. Google CEO Eric Schmidt told reporters at the Consumer Electronics Show in Las Vegas that this would depend on permission from copyright holders.
We know that Amazon.com has plans to allow users to purchase online access and/or and download chapters or pages of books via their "Amazon Pages" and "Amazon Upgrade" programs that are set to launch sometime this year.
Actually, Amazon.com already offers some popular titles for downloading to Microsoft Reader. The Da Vinici Code is one example.
Microsoft also has their own ebookstore.
Google would also face competition from many other online ebook vendors and services. Browsing through the DigitalBookIndex and eBooklocator databases will give you an idea of what ebook content is already out there. You can find a list of other online and ebook sources here.
When I first learned about the Google Library program (not the same thing as Google Book Search for Publishers) I thought that once Google digitized a lot of this older, non-digitized content AND with then gain permission of copyright holders it would be a natural for them to sell it both electronically and/or offer a print-on-demand service. Perhaps, these comments from Schmidt are the first we're publicly hearing about the idea. Of course, now with the launch of Google Video Store, they're developing the business (Google Payment Corporation) and methods for online payment. Who knows? It seems that just about every there is talk/speculation about another new service from Google. Chris even joked yesterday that Google Doctor might be not far off.
Having manangement tossing out ideas, keeps people talking (Google speculation could be a full time job), buzz humming, and investors investing. This is as much a part of Google's brilliance as anything else.
You can read more about the backstage press event in this Endgadget report. A BBC interview with Eric Schmidt is also available.
Thanks to Science Library Pad for the news tip.
Posted by Gary Price at 1:21 PM | Permalink
A couple of items for web researchers.
First, Dean Giustini, a medical librarian at University of British Columbia in Vancouver (one of the most beautiful places I've ever been) and editor of the UBC Google Scholar Blog has a good summary of recent articles about how Google Scholar is being used in the medical profession.
Second, Infotrieve, a well-known name in the library world, has just announced that public access to their ArticleFinder database is now free.
What is ArticleFinder? It's a bibliographic database that also offers you the search, read abstracts and purchase individual journal articles as needed from a single source, this is what's often referred to as document delivery.
Content ArticleFinder has a lot of it. According to the web site, it's currently home to more than 26 million citations and eight million abstracts from over 54,000 journals, in science, technology and medicine (STM). More than 44,000 entries are added each week. This page has a breakdown of articles by discipline.
Search ArticleFinder offers two interfaces. One, a simple search box that can handle natural language and the other, an advanced interface that offers fielded searching (date, journal name, author, publisher, etc.). Another option allows you to narrow your search by discipline.
Fast Facts The ArticleFinder FAQ offers many more facts. One important note is that ArticleFinder does not search the full text of an article but rather the title and abstract. In terms of searching AF offers both wildcard search options as well as fuzzy searches. It's also possible to have your citations exported into one of three services and email results.
This service is more than worth a look.
Final Thoughts Two items. 1) The Infotrieve Virtual Library service (fee-based) allows a company or library to tie their e-journal holdings to he service. No word on if or when this feature will also become free. It would be great if did happen. 2) While much of the material in ArticleFinder is on the technical side of the aisle, don't forget that many public libraries offer great access (24x7x365) to thousands of full text article from journals and newspapers (licensed for personal use) for free without having to leave your home or office. More about that here.
Posted by Gary Price at 6:16 PM | Permalink
Steve Rubel thinks he's "hacked" Google Book Search, as he covers in his Read Most of O'Reilly's Hacks Books for Free Using Google post. In reality, I think he's just finding that Google Book Search operates exactly the way it is supposed to operate, to show you a percentage of a book that a publisher itself has allowed you to view online.
Steve describes reading books in O'Reilly's "Hack" series, such as Podcasting Hacks. He'll go to the table of contents, pick a hack he wants to read about, then is able to read an entire chapter covering the hack as the chapters are fairly short. If I understand right, he then goes back to the table of contents, finds another chapter, then reads that.
Scary sounding stuff, reading the entire book online like that! Actually, it turns out he can't read the entire book. The percentage he can read isn't so scary when you understand that a publisher is allowing it.
Once Again -- The Difference Between Google Print & Google Library covered this once before, on how publishers work with Google Book Search. Nevertheless, I'll do a short version and apply it to what Steve found.
Google Book Search takes in content in two different ways. There's the Google Library program, where they scan books. If the book is out of copyright, the entire content may be displayed. If it's in copyright, nothing is displayed other than small snippets. Then there's the entire separate Google Books Partner program. Publishers in that program, like O'Reilly, voluntarily submit their books. When they do this, they can also indicate how much of their books they want to have displayed, from 20 to 100 percent. If they don't want any of it viewable, then only snippets and no actual pages are shown.
In Steve's case, O'Reilly is in the partner program. You're told that at the top of the pages you view, where it says:
Provided by O'Reilly through the Google Books Partner Program.
Now remember that 70 percent figure Steve was talking about, that he could read about 70 percent of the hacks in any particular book? Sounds to me like O'Reilly's gone with a 70 percent viewable figure for its books.
You can see another mistaken assumption (or perhaps intentional twisting) of how Google Book Search works over at Google Watch. Scroll to the bottom of this page, which is against the library scanning program.
You'll see a graphic with the faces of Google cofounders Larry Page and Sergey Brin saying, "Hey boys and girls, write all your term papers using Google's snippets. No need to visit the library to find that copyrighted book."
The example used below the smiling faces of Larry and Sergey is a search on Steve Badrich, with this snippet shown:
Campus Wars: The Peace Movement at American State Universities in the Vietnam Era by Kenneth J Heineman - History - 1994 - 160 pages Page 134 - ... Steve Badrich, decided in March 1966 to enlist in the marines rather than spend two more anxious years at the university while his draft board made ... [ More results from this book ]
Below that is a screenshot of an actual page from the book, such as you'll see here at Google Book Search.
Conclusion? I think many will read that as an example of how Google Book Search is taking copyrighted books out of libraries and putting them online in a viewable format. But go up to the top of the page, and you'll see this:
Campus Wars: The Peace Movement at American State Universities in the Vietnam Era by Kenneth J Heineman - Provided by NYU Press through the Google Books Partner Program
In reality, this book wasn't scanned through the library program. It was put into Google Book Search by the publisher itself, NYU Press. And the reason those college "boys and girls" can view the page online is down to the publisher itself allowing this.
Gary looked earlier in Can Full Book Preview Prevention Be Hacked? at another mistaken assumption this year that someone had found a Google Book Search hole when in reality, it was the publisher allowing viewing.
Gary's post also covers the only single report (source material no longer online) I've ever seen about someone saying they found away around protections entirely. This was before Google had the required log-in system and used a wholly cookie-based on. Since that time, no honest-to-goodness hacking has come to light that I've seen.
I'm not saying it's impossible. It wouldn't surprise me if it happens. But that's not what Steve's done here. There's no "hole" that he's "hacked," as far as I can tell.
Postscript from Gary: I agree with all of Danny's comments. Two quick points. First, a recent post about Firefox add-on CustomizedGoogle says that they have a method that allows the printing of CustomizedGoogle pages. This would also make for some issues if the tool grew in popularity and people started printing thousands of pages. Second, if you're interested in reading and searching all of the O'Reilly books as well as tech books for many other major publishers including MS Press, Sams, Prentice-Hall and many more. I suggest taking a look at a service named Safari Tech Books, that just happens to be co-owned owned by O'Reilly. Yes, O'Reilly content everywhere. This service allows full text searching, fielded searching, printing, e-mailing, and more. As I've said many times in 2005, many libraries like the San Francisco Public Library offer access to this service for FREE! That's right, free!!! No hacking needed. (-: If your library doesn't offer it, then you can subscribe to Safari. Prices vary but access to up to 10 books (full text, no limit) is about $20.00. This page has more info. Interested in a free trial to to the full service? Go for it! It's completely free for two weeks. Register here. Btw, Safari offers other tools. For example, notificiation of new titles via RSS. A feed generator makes all sorts of feeds (titles by publisher and category) possible.So, Safari doesn't have the titles you want. Then take a look at Books24x7. Again, this service is usually licensed to companies, libraries, etc. but individual subs are also available. Again, full text, seachable access (no limits on how much you can view or print). This collection offers more than technology books. Here's a list of a few recently added titles. You can also request a free trial.
Access to more the 20,000 NEW books (no limit on how much you can view online) is available from ebrary. Online access is free, just pay to print or copy pages. Usually about $.25/page. Great stuff. More in this SearchDay article.
Posted by Danny Sullivan at 6:09 AM | Permalink
A few weekends ago Wall Street Journal reporter, Kevin Delaney, gave me a call asking for a few ideas, thoughts, and suggestions about useful specialized databases (aka verticals) that would be of interest to WSJ readers.
Today, the article was published and it's titled, "Beyond Google." You'll find it linked here. However, at least for the moment, Kevin's story is only available to WSJ subscribers.
A couple of quick comments and notes:
1) Thanks Kevin for asking for my suggestions and for the quote. You should know that for each database suggested and included in the final article, 40-50 more could have been included and received a well-deserved mention. I had to limit my picks for obvious reasons. Of course, Kevin spoke to others and also included their suggestions.
2) The "Beyond Google" headline is great. The word Google has a way of drawing peoples attention and the title of the headline is often the title of presentations I give. Why? A presentation titled, "Learn about Specialty Databases" does not pack in the crowds. Tossing the word Google into the title, does.
Specialty tools do not replace general purpose large web engines like Google, Ask Jeeves, Yahoo, Gigablast, Exalead, and others. A web researcher should have a good working knowledge of both general databases and specialty tools. Plus, in terms of some of my presentations, the word "Google" gets the crowd in the door and then I have time to not only talk about Google (many don't have any idea of what it can offer) but also have time to talk about the great useful stuff being developed by AJ, Yahoo, and elsewhere. So in reality it's a two pronged presensation. As I posted on Friday, it's clear that many people who use these and other tools have little to no idea of how these services work and what they offer.
+ General web engines (The full landscape, how to take full advantage of some of their services, creating better queries). These days it can also include time letting the audience know about verticals that these companies also provide like Yahoo Audio Search.
+ Specialized databases (verticals) the power and often time saving capabilities they offer. The challenge for many is just knowing about them.
3) If you read the blog on a regular basis, you'll likely notice that Kevin used several suggestions that I've written about on our site. Cool!
4) I was especially pleased to see the WSJ article mention the wonderful RegLightGreen bibliographic database and NetLibrary, available for free from many libraries that offers the full text of thousands of books. Remember, as I wrote in this guest column for BetaNews, public, university, and many other types of libraries offer FREE, 24x7x365, access from any web computer (no need to go to the library) to a full range of specialized databases that often offer content not found in web engines (full text journals, newspapers, magazines, reference books, etc.) OR packaged in such a way to add extra value to the data. Plus, these databases tend to offer search capabilities not found from general web engines. Every library offers different service and databases. The easiest to learn what your library offers is to either look at their web site or make a quick call.
Postscript: I'm happy to report that at least for the moment, it's the most popular story on the WSJ site today. Yes, I think the public is beginning to understand the value of specialized tools.
Posted by Gary Price at 1:27 PM | Permalink
HarperCollins To Digitize And Control Its Book Content from the Wall Street Journal looks at HarperCollins saying it will digitize its active backlist of 20,000 titles and up 3,500 books per year. Part of the idea is that by doing this itself, the publisher can give content to the search engines to index but keep the files themselves. That leads me to think HarperCollins doesn't understand how book indexing works. From the story:
Search companies such as Google will then be allowed to create an index of each book's content so that when consumers do a search, they'll be pointed to a page view. However, that view will be hosted by a server in the HarperCollins digital warehouse. "The difference is that the digital files will be on our servers," said Brian Murray, group president of HarperCollins Publishers. "The search companies will be allowed to come, crawl our Web site, and create an index that they can take away, but not the image of the page."
This would prevent such Internet companies from selling a digital copy of that book unless HarperCollins decided to partner with them as a retailer. "We'll own the file, and we'll control the terms of any sale," he added.
OK, in order to make a searchable index of a book, a search engine is essentially making a copy of the book, though it doesn't mean that it reprints that copy. Indexing Versus Caching & How Google Print Doesn't Reprint from me earlier explains this in more depth.
So yep, the search engines won't have images of a book to display -- assuming they go along with this -- but they will have a copy of all the words in the books. And that's pretty much all Google doing with the Google Library scanning project -- making an index of books, a card catalog, exactly as HarperCollins wants to replicate.
Interestingly, HarperCollins -- though not a party to that suit over Google Library -- says it supports it "economically and philosophically." Well philosophically, it doesn't seem to understand it's doing pretty much what Google's doing already.
Here's the especially tricky bit. Google and gang, if they are "allowed to come, crawl our web site," as HarperCollins puts it, are then going to have access to the same content the general public gets. In other words, whatever you put out for crawlers, anyone gets. So is HarperCollins going to put the full text of books online? Because then forget the part about selling digital copies (not that Google and gang are doing that now). The digital copies will be out for anyone to access.
Alternatively, the various search engines do have programs where site owners can submit content, such as Google's here. But you can't just send them some non-descript "index." They want PDF, though the program doesn't require that actual pages have to be shown, despite coming in as PDFs.
Aside from book search, there are programs such as Google Scholar or Yahoo Search Subscriptions that can effectively left content owners cloak material -- the general public sees abstracts while the search engine indexes the good stuff. But neither of these, to my knowledge, will work for book search.
Posted by Danny Sullivan at 12:18 PM | Permalink
Tomorrow, the Google Library Project will be one year old. It's been quite a year of news and controversy. Here's a link to our first SearchDay article about the Google Library Project from December 14, 2004. In this article, I made sure to mention other digitization projects like Project Gutenberg that have been around since 1971.
What's crucial to remember is that Google Print (recently renamed Google Book Search) itself was around before Google's announcement to digitize the full or partial holdings of five large university libraries.
Let's review.
Google Print for books (materials direct from publishers) was opened "widely To publishers" on October 6, 2004. However, the existence of Google Print goes back even before that to December 17, 2003, when Google began offering book searches. The original Google Book search indexed, "only a small excerpt from each book, typically taken from the inside cover, jacket reviews, author biographies or the book's introduction.
To this day, Google Book Search (the material direct from publishers) and the Google Library Project are frequently confused. SEW BLOG has tried to make the differences clear since day one. Recently, Danny did a great job of explaining the important differences in his post: Once Again -- The Difference Between Google Print & Google Library. This post also contains links to many other articles about the project, digitization, and opinion about copyright issues.
The remainder of this post will offer a few key posts about the project (yes, I could have included more), a timeline of sorts, from the past year, along with links to some other Google Print/Book Search/Library Project related documents.
+ Questions & Answers Recap On Google Library This Info Today article by Barbara Quint is loaded with details about the project.
+ France, Google & The Need for Digitization Project Cooperation
+ Copyright Questions On Google Digitization Project
+ Some Publishers Not Happy With Google's Library Digitization Program
+ Google Library Digitization Agreement With University Of Michigan Now Available
+ The Digitization Of The Library
+ More On Publisher Concerns On Google Library Project
+ Google Gives Publishers Opt-Out From Library Scanning Project; One Group Still Not Happy
+ More Publishing Trade Groups Weigh In On Changes to Google's Library Scanning Project
+ Legal Experts Say Google Library Digitization Project Likely OK; Will It Revolve Around Snippets?
+ Breaking Down The Google Print 5 Libraries An article from Digital Libraries.
+ Google's Library Scanning Project Heads to Court
+ A New Alternative to Google Print Say hello to the Open Content Alliance!
+ Google Print Press Review & Just A Bit About Search Inside the Book This post includes a link to Eric Schmidt's op/ed column in the Wall St. Journal.
+ Association of American Publishers Sues Google over Library Digitization Plan
+ Great Google Print Controversy Bibliography Includes link to, "The Google Print Controversy: A Bibliography" by Charles W. Bailey, Jr. Impressive!!!
+ Microsoft Announces MSN Book Search; Joins Open Content Alliance
+ Google Gears Up to Resume Book Scanning
+ Google Print Now Publishing Out-Of-Copyright Works Gained Through Library Scanning Program
Yes, it has been quite a year and I could have listed more reports. I'm sure year two will have as many, if not, news stories, court hearings and events as the The Google Library Project's first 365 days.
Postscipt: I've recently posted about two other services, currently available, that offer the full text (no limit on how much you can read) called ebrary and NetLibrary. Looking for public domain full text books? Visit, "Public Domain Books: More than 25,000 Full Text Books in a Single Database."
Want to comment or discuss? Visit our SEW Forums thread, Google's Library Project One Year Old Already.
Posted by Gary Price at 3:50 PM | Permalink
We're talking a lot about digitized books these days here on the blog. Project Gutenberg has been around since 1971 digitizing and copying public domain books. Via GB, this Wall Street Journal Q&A interview (free) with its founder Michael Hart on book digitization, Google Book Search, and more.
Project Gutenberg one effort that we've mentioned from the very first day of Google's Library Project. That's why I was a bit disappointed to read when Hart says Gutenberg got no atttention when Google's Library Project was first launched. In the first SEW article about Google Library a SearchDay article from December 14, 2004, I mention the project, link to its homepage, mention the amount of books available (at that point) and also mention Mr. Hart by name as its founder.
This is NOT the first time Mr.Hart has spoken about Google Library and book digitization. In October, we published an a brief essay by Michael Hart here on the SEW Blog.
By the way, you're right, the Google Library Project will celebrate its first birthday later this week.
Finally, most of the books in Project Gutenberg and thousands of other full text materials can be found in The Open Books Page that I wrote about last week. It's a must see (the site that is (-:)
Posted by Gary Price at 4:10 PM | Permalink
In an exclusive Information World Review interview: BL opens up to Microsoft and reveals revenue aims, Alistair Baker, Microsoft's UK managing director, talks about both parties [The BL and Microsoft] investigating new business opportunities in the recently announced arrangement that will have MS scanning 25 million pages (about 100,000 books) of only out-of-copright material that will become available via MS Book Search, a memeber of the Open Content Alliance.
From the interview: This is the start of a very long journey; in five years time the things people search for will be very different to today,? said Baker. ?The aim of this relationship is long term.?
Baker said there are real commercial possibilities based on the content held within libraries, and especially the British Library, that are not currently understood. ?There is a lot of interest in historical artefacts, but to go to the British Library you have to have a real desire to see something. By making those artefacts available online we will broaden the level of interest in them.? Baker believes that recent largescale adoption of broadband internet connectivity will increase the interest in the BL holdings for research as well as enjoyment. ?Bill Gates sees this as a massive opportunity.?
Hmm. Since the recent announcement is about books from what Baker has to say perhaps MS and the BL are thinking about digitizing other types of objects?
Posted by Gary Price at 9:34 PM | Permalink
Even More Online Book Search SitesGary has been blogging about several online services, such as Google Book Search and ebrary, that allow you to search, and in some cases read and print thousands of online books, often for little or no charge. He continues in today's SearchDay article, More Online Books Resources, with a look at NetLibrary and modestly named but amazingly comprehensive Online Books Page.
Posted by Chris Sherman at 8:49 AM | Permalink
Talk about online books seem to be a very frequent topic of conversation in the search world today. In the past few weeks we've not only posted about the latest with Google Book Search/Google Library Project, the Open Content Alliance, and new stuff coming from Amazon and others.
I've also provided overviews of both ebrary and NetLibrary that I hope you take a look at. But why another book post?
I once again want to mention and incredible browsable and searchable database, The Online Books Page, of books in the public domain (including Project Gutenberg materials) that's been edited by John Mark Ockerbloom since 1993. The OBP currently lists more than 25,000 titles.
Listings can be searched or browsed by authors, titles, an subjects.
The amount of material added on a daily basis is quite amazing. When does John sleep? An RSS feed of new listings is also available.
Finally, John mentions that he is adding some public domain books from Google Book Search. He has also added the "first demo batch" of books from the Open Library, sponsored by the Open Content Alliance.
When I'm asked to name some of the most impressive tools on the open web, the OBP is consistently near the top of the list.
Posted by Gary Price at 10:26 PM | Permalink
The Information World Review (IWR is a VNU publication) article: Google digitisation faces Euro legal challenge, reports on Google's book digitisation project (the Google Library Project to be precise) facing some legal obstacles in Europe.
Here it is in a nutshell, direct from the article: Google has acknowledged that it cannot digitise copyright material from European libraries, according to the Association of Learned and Professional Society Publishers (ALPSP).
The article goes on to say that in meeting last month Google agreed that: ...it was "absolutely the case that it is not allowed to [digitise in-copyright material from libraries] in Europe.
At the moment, The Bodleian Library at Oxford University is the only one of the "Google Five" libraries located in Europe. This post has more about the holdings of all five libraries including in-copyright and public domain holdings.
ALPSP chief executive Sally Morris said that she is planning to create a system that will make it easy for Google, the Open Content Alliance, or any other organization wanting to digitise material.
She told the Bookseller: "The fact Google recognise they can't do this without permission in Europe gives us a threshold to work out a way for them to get permission. In America, they have the law on their side. Here, they accept they don't."Her suggestions, put to Google at the meeting, include a Canadian model whereby, if it proves impossible to locate a copyright owner, a licence is granted so the material can be used legally.
Morris also told IWR that she is waiting to here back from Google on these issues. She said that Google was interested.
Btw, Danny chatted with the ALPSP's Sally Morris in this blog post.
Posted by Gary Price at 8:22 PM | Permalink
Before we offer a look at another online book database, NetLibrary, a brief review.
Last week I wrote about ebrary, a company offering thousands of full text books (no limit on how much you can read) that licenses new books (in-copyright) to many companies, libraries, and other organizations. They also offer services (for a very low cost) for individual users that I cover in my article. For example, you can search/view about 20,000 books and pay about a $.25 a page to print or copy. Btw, the article also includes mentions of similar services like Books24x7 and Safari Tech Books.
OK, that was last week.
This week, a look at another service called NetLibrary (a division of OCLC) that has been around for several years and just last week passed a milestone, they've now digitized more than 100,000 titles (mostly new content). Additionally, NetLibrary offers ebooks, audiobooks, and even some ejournals online. The books are available full text and in some cases can be printed/annotated. Of course, all of the content is fully searchable. According to an announcement last week, more than 20,000 new titles have been loaded in 2005 alone.
So, where does the material come from? NetLibrary works with more than 400 publishers. Here's a list of those publishing partners.
Ok, this sounds cool but where and how can you access this material? Unlike some of the services I mentioned last week, NetLibrary doesn't offer a program for the individual searcher but rather provides its services through more than 13,000 libraries around the world. For example, the San Francisco Public Library offers a collection of NetLibrary materials that you "virtually" checkout from the collection.
As I've said many times before, NetLibrary and many other databases are accessible for free from home, dorm, office, anywhere with a web connection. All you need is a library card from a library that offers NetLibrary services. Here's an article with more about what's accessible. From ebooks, to audio books, to full text journals, and more. Every library offers a different collection of NetLibrary materials just like every physical library offers its own collection.
This Fall, NetLibrary 4.0 will debut several tools that you can learn more about here and see a screen shot of the interface. It will offer something that we're likely to see more and more of in the future from all databases, automatic summarization. More demos here.
Finally, here are a couple of NetLibrary screen caps. First, a search results page. Second, a page from an actual book via NetLibrary.
Posted by Gary Price at 10:25 PM | Permalink
A Man's Visition: World Library Online isn't about the US Library Of Congress & Google-backed World Digital Library program announced today but rather the Yahoo & Microsoft-backed Open Content Alliance. It covers the Internet Archive's Brewster Kahle's dream to create a "kindler, gentler" digital library that might not irk publishers in the way Google Library has.
One attendee of the launch party for the OCA talks about how it was "the un-Google meeting." Funny, but not. The OCA should try (and apparently is trying) to get Google on board, and attitudes of attendees like that aren't going to help. For its part, Google should strongly consider joining the OCA.
Everyone would benefit from more cooperation, rather than probably duplication of efforts and incompatible standards. The story has Google saying they are in talks with the OCA but that there is nothing to announce. Given that Google's backing the WDL now, I have a feeling we're just going to see more rivalry.
Want to comment or discuss? Visit our Search Engine Watch Forum thread, World Digital Library Project.
Posted by Danny Sullivan at 8:14 AM | Permalink
Library of Congress Announces World Digital Library Project; Google Donates $3 Million To EffortGoogle's involved in another digital library program, this time one being backed by the US Library of Congress. The World Digital Library aims to collect the world's "rare and unique cultural materials" in a digital format to make them accessible to anyone. The program is being kicked off with $3 million in funding from Google. Today's SearchDay article, World Digital Library Project Announced, Backed By Library Of Congress & Google, has more details on the plan.
Want to comment or discuss? Visit our Search Engine Watch Forum thread, World Digital Library Project.
Posted by Danny Sullivan at 12:00 AM | Permalink
With as much press as Google Print has received, it's a comparatively lightweight service compared to other, more established book search services, that let you both search and read the full text of contemporary books—legally. Gary takes a look at one of the best-established and most comprehensive book search services in today's SearchDay article, A (Non-controversial) Alternative to Google Print.
Posted by Chris Sherman at 8:52 AM | Permalink
It's all in a name, especially when that name confuses people. In a post on the Official Google Blog, Jen Grant, a Product Marketing Manager at Google says that Google Print has a new name. The service is now called Google Book Search. Makes sense to me. URLs for http://print.google.com now redirect to the new books.google URL.
Why the change? Grant writes: Well, one factor was all the comments we got about how excited people were that Google Print would help them print out their documents, or web pages they visit -- which of course it won't.
Sure, the name Google Print might not have been the best choice in the first place but I also think Grant's comments once again point out that with all of the services, tools, etc. that those of us who watch the search industry on an daily (if not more) basis understand are often unknown or in this case, misunderstood by the masses. Even with the large amounst of attention, especially in the case of Google Print, has received it's possible. If it can happen to Google, it can happen to any company.
Calling the service Google Print gave Google the option to easily include non-book material in the database. For example, articles from magazines. Now, they'll have to brand (something Google does very well) another product if/when they decide to offer content from magazines and other print sources.
Grant continues that the name change also reflects the evolution of the service.
Now that we're starting to achieve that, we think a more descriptive name will help clarify what our users can do with it: namely, search the full text of books to find ones that interest them and learn where to buy or borrow them.That's true, you are able to "search" the full text but unless the book is in the public domain, you'll only be able to read a selected amount (as determined by the publisher) online. In the case of in-copyright books scanned as part of the Google Print/Book program for Libraries, you'll only see snippets that contain your search terms. In both cases, you'll not be able to print the material. Danny does a great job of summarizing what SEW Blog has been saying since the beginning about the differences between Google Print/Book for Publishers and Google Print/Book for Libraries in his post: Once Again -- The Difference Between Google Print & Google Library. Google has also tossed out the idea of online book rentals.
Google Book Search is not the only online books search service available today or coming in the future. Here are links to a few others:
+ Search and Read Full Text Books Online via ebrary In this case you can read the full text of 20,000 books online, pay to print and copy. Links to other services including NetLibrary, Books 24x7 and Safari included in the post.
+ Microsoft Announces MSN Book Search; Joins Open Content Alliance
+ Amazon.com's "Search Inside the Book"
+ More Sources For Ebooks & Electronic Text
Posted by Gary Price at 11:27 AM | Permalink
Since book search is all the rage these days I want to spend a few minutes talking in-depth about a service I've mentioned in the past but didn't really describe in detail. The service is called ebrary that has been around since 1999. ebrary offers numerous services including one that lets you search and read over 20,000 in-copyright books for free. You pay only to print and copy text.
ebrary, like other services (NetLibrary, Books24x7, Safari Tech books) spend a great deal of time marketing their services for licensing by libraries and the enterprise market. ebrary is no different and their fully featured search technology is very cool and powerful.
What makes these and other services different than Amazon's Search Inside the Book and Google Print is that they allow the user to read, annotate, print (in some cases) the full text of the book. In most cases these are new, in-copyright books, that can be full text searched.
Shop ebrary ebrary also offers a little known service that allows those without access to a library subscription to access and read (online) over 20,000 books (from major publishers), sheet music titles and reports.
Full Text online access from Shop ebrary content is free, you pay only for what you print and/or copy.
It's that easy.
In many respects it sounds very similar to what Amazon.com has planned with their new Amazon Pages and Amazon Upgrade programs that will launch next year and perhaps what Danny just blogged.
So, let's take a closer look and run a search at Shop Ebrary.
+ Create a ebrary Shop account + You'll need a credit card and have to put a minimum of $5 if you decide to print or copy content. + You'll also have to download and install the ebrary + Next, login and start searching the 20,00O+ books + A simple search box exists as well as an advanced interface that allows you to search specific fields like title , subject, and language. + Plenty of search syntax is also available including a proximity operator. You can even set-up a virtual bookshelf.
I ran a search for [Jupiter and Saturn].
The results page shows title, a cover image (if possible) and basic bibliographic info. You can sort results by relevance, title, contributor and date.
Now, select a title. THe book I select is published by Cambridge University Press. You'll also see that it's FREE to view the full text and $0.25 a page to copy and $0.25 per page to print. Prices vary by title.
That's about it. Your search terms are highlighted and you can highlight any text you choose. You can also add notes and bookmark specific sections of the book.
Shop ebrary is a very useful full text book search database that's also an example of things to come. Of course, the full power and collection of titles is available to library subscribers. Nevertheless, Shop ebrary is a great place to get your feet wet while reading lots of interesting material.
To learn more about ebrary and its leadership, take a look at this interview with Christopher Warnock the founder and CEO of the company. Here's another chat with Warnock.
Postscript: Books24x7 and Safari offer subscriptions to individuals.
Posted by Gary Price at 11:01 AM | Permalink
Google Exploring Book Rental PlanVia News.com, Google Checks Out Interest In Online Book 'Rental' Plan at the Wall Street Journal covers Google having approached at least one publisher about an idea to "rent" books. You won't be able to download or print books, initially -- which suggests this is a way to entice publishers taking part in Google Print to make more of their content there readable online.. Google confirms at least that it is exploring "new access models."
Posted by Danny Sullivan at 10:46 AM | Permalink
RLG, a highly respected organization in the library world with more than 150 research libraries as members, has announced a partnership with LookSmart.
Today's announcement brings access to RLG's Trove.net database of "209,000 rare and unusual images" to searchers via LookSmart's FindArticles, a LookSmart site.
Trove.net contains works from 300 BCE to the present and includes a variety of images from leading cultural heritage collections. The works represented range from papyrus fragments from ancient Greece, to Chaucer's "The Canterbury Tales," to vintage advertising labels, to 20th-century architectural photos. These images can be licensed for use by individuals and businesses. RLG's Trove.net contains works from major international collections?national libraries and renowned universities?as well as images from many other museums, libraries, and archives.Here's an example search for the term "baseball." At the bottom of the results page you should spot an image result. Click, and you'll see a larger thumbnail with the chance to click again and see the full image. Those with institutional memberships to RLG Cultural Materials can click through to the image while those who don't can license the image for various types of use.
Btw, if the initials RLG sound somewhat familiar, they should. SearchDay published an article I wrote last week about RLG's wonderful and free bibliographic database named RedLightGreen.
Posted by Gary Price at 6:10 PM | Permalink
Book digitization has been going on at the University of Toronto Libraries for some time. In fact, I wrote about scanning at UT and included a link to a video of the scanning robot almost a year ago.
In a Wall Street Journal article published today titled: Building an Online Library, One Volume at a Time, you'll meet Liz Ridolfo, a scanner at UT who is digitizing books. You'll learn about her daily work and get a quick look at exactly how it works.
Ms. Ridolfo is part of a massive undertaking to digitize the world's books. She is one of about a dozen scanners employed by the Internet Archive, a San Francisco nonprofit group that is spearheading the Open Content Alliance, a consortium of business and educational groups that includes Microsoft Corp., Yahoo Inc., Hewlett-Packard Co., Adobe Systems Inc. and several university libraries. The group wants to build an online library of millions of old books and hopes to make a big batch accessible through Web searches as early as next year. For all its technical sophistication, the group needs the manual work of people like Ms. Ridolfo to make digitization a reality.NOTE: Access to the full text of the WSJ is free this week to non-subscribers.
Posted by Gary Price at 6:59 PM | Permalink
Macmillan's Book Digitisation Program: BookStoreTrying to keep all of the book digitization projects out there in some order is becoming a full time job. The Information World Review article: Macmillan takes on Google Print, features comments from Macmillan Chief Executive, Richard Charkin, about the his company's content digitisation plans called BookStore.
From the article: Like Google Print, BookStore will be a searchable repository of digital book content, with e-commerce technology for purchasing titles. Charkin said BookStore will appeal to publishers that want to take advantage of releasing their content online, but don?t want to surrender control of their copyright or invest in the technology required. ?We need to be able to do deals with people that we can measure, not to hold onto material, but to know who is using it and how,? Charkin said. ?Publishers have to get their act together with the entry of Yahoo and Microsoft into the arena alongside Amazon and Google.? With three major parties digitising books for the web ? Google, Macmillan and the Yahoo/Microsoft-led Open Content Alliance ? Charkin, who is also president of the Publishers Association, has called for all sides of the industry to collaborate.
Posted by Gary Price at 2:49 PM | Permalink
Google Print--Is it Time to Change Copyright Laws?The Salon.com article: Throwing Google at the book, takes another look at the Google Print program and asks if it's time to change copyright laws.
Posted by Gary Price at 12:42 PM | Permalink
As the resident librarian around here I wanted to toss out a bit about the electronic library catalogs (aka card catalogs) of 2005. While it's true that many people think of paper card catalogs, I think it's worth pointing out that while many people still think of them as tools to simply find books, videos, etc, and then go to the library and checkout the book (DVD, CD, or lead them to the right magazine or quality web site), that library catalogs of 2005 are offering MANY more services than what many people expect to find. What follows is a brief, and I do mean brief, overview. Btw, these days card catalogs re referred to as OPACS (Online Public Access Catalogs).
The challenge in describing and demonstrating all of this is that every library offers different services and technology. However, here's a taste of what I'm talking about. Remember, all libraries offer different services.
+ Book Reviews, Cover Images, Etc. Here's a search for a from the online catalog at the San Francisco Public Library. The basic page contains what you've come to expect from a card catalog. However, you'll also notice a thumbnail image of the cover. You'll also find a brief summary of the book. Neat! Now, click the "I" icon (right side of page) and you'll find links to reviews from Publisher's Weekly and Library Journal. Naturally, reviews and what "value added" info is available depends on the individual entry.
Here's another example. In this case you get reviews plus a portion of the opening chapter of the book.
+ Buzz The Henderson County Public Library in Kentucky mines their catalog and produces all sorts of lists. For example, a list of the most requested items, most popular movies, and a list of new materials.
+ RSS Of course, library catalogs are also utilizing RSS to inform patrons about new book and useful materials. The University of Alberta offers a long list of feeds by subject area and library.
+ Hyperlinked Subject Headings Here's a search for "search engines" from the Enoch Pratt Public Library in Baltimore. You'll see a list of titles and images of book covers. The first book listed is JB's, "The Search." Now, click the "Details" button. You'll find a summary and the table of contents. Finally, click the "catalog record" tab and look for subject term links that are all hyperlinked. Simply click one or more of them to find other books that have been assigned these subject terms by a human indexer. Click on "Internet Searching" and you'll see what I mean.
+ Interfaces I think you'll see that a variety of interfaces are available depending on the searcher's skills and needs. A library system in Illinois makes their catalog available in a traditional method or by using a series of images.
+ Online renewal and Reserving Materials Pretty straightforward. Enter the bar code or title and you're done
As I've hope you've now seen, today's library catalog might not be what you've come to expect. Of course, I'm only showing a few examples from a few libraries. It's not to difficult to imagine full text and public domain material being added to each record in the future.
Finally, don't confuse a library catalog with the many full text (newspapers, magazines, reference books) and FREE databases that libraries (of all types) make available from home or office. All you need is a library card. Each library offers different tools. More on that here. In fact, one database many libraries offer is called NetLibrary which allows the user to "virtually" check out the full text of thousands of new and old books that you can annotate, print (in some cases), and share.
Postscript: Last week, Chris published an article I wrote for SearchDay about a library catalog that contains more than 120 million items and then allows you to link to local libraries to see if the material is available. Heck, it will even format the entry into one of several bibliographic formats.
Posted by Gary Price at 8:26 PM | Permalink
Amazon's Jeff Bezos on Book Scanning and Amazon's New Digital Book ProgramsIn a new interview Amazon.com CEO, Jeff Bezos, talks about Search Inside the Book and their new Amazon Pages and Amazon Upgrade programs.
Last week, we posted details about these programs (Amazon Pages and Amazon Upgrade) which are based on their impressive "Search Inside the Book" program that I am, well, just about obsessed over. It many cases it offers info about titles that's unavailable elsewhere. Btw, you can keyword some Search Inside the Book Material using the book search option via A9.
Search Inside the Book (online before Google Print) equates most closely with the Google Print for Publishers program where publishers send in-copyright material to Amazon.com for inclusion, searching, and in many cases analysis. The full text is searchable but the searcher can only view a limited portion online. This is not the same thing as the Google Library scanning program that has caused lots of attention and that will only show "snippets" of scanned in-copyright library materials. Public domain materials are a completely different matter. We've been attempting to explain all of this since day one and Danny takes another stab at it today.
Here are a couple of quotes from the article: Bezos on Search Inside the Book: One of every two books we sold in the United States is now in the Search Inside the Book program. That has been done the whole time from the beginning until now--and it will continue to be done with the permission and cooperation of the publishers.
On Amazon.com Adding Open Content Alliance Material to Their Database? I think we'll have to wait and see on that, but you know, in the public domain, I don't see why not.
Posted by Gary Price at 3:37 PM | Permalink
MSN & British Library Partner On Book Search ProjectWe wrote earlier that MSN was getting into the book search game. Last Friday, another piece of the puzzle came out. MSN announced that it would digitize about 100,000 books through a partnership with the British Library.
Hmm. Isn't that what got Google into trouble? Yes, because Google is scanning both in and out of copyright works. So MSN will stick with only out-of-copyright public domain works, correct? Not necessarily from reading the MSN announcement:
We will predominantly focus on digitizing out of copyright material in this partnership.
Predominantly isn't the same as exclusively -- sounds like some copyright material might be getting included. But the AP reports:
Microsoft and the British Library stressed that they will be choosing books only from the older end of the library's vast collection of 13 million titles, as these have long fallen out of copyright.
The press release over that the British Library web site also says that only out-of-copyright books will be included.
Posted by Danny Sullivan at 11:14 AM | Permalink
Further to today's post about Amazon planning to sell electronic books next year, here are some alternative places where you can get them right now:
+ Digital Book Index "111,000 title records from more than 1800 commercial and non-commercial publishers, universities, and various private sites. About 72,000 of these books, texts, and documents are available free, while many others are available at very modest cost. Registration, also free, is required to access the database. You will need to give an email address but can make the choice to opt out of any future mailings. The database can be searched by author and title. You can also browse by author, title, subject, and publisher.
+ eBooklocator.com Search for commercially available ebooks from more than 400 publishers. Listings come from Overdrive's Content Reserve Digital Content Marketplace. Search by: + Format + Title + Author + Subject + Keyword + ISBN/DOI
Also, many fee-based services have been online for years that provide, free, searchable access to thousands of in-copyright books. Often, these services are free from public and university libraries. All you need is a library card and you can access the content outside the library from any computer. In many cases you can print, annotate, share, etc. this material. These services include:
+ Safari from O'Reilly Access also available for individual purchase.
+ Books 24x7 Access also available for individual purchase.
+ ebrary Individual access to more than 20,000 full text books available. Costs just $5.00 to open an account.
FYI, if you're looking for a one-stop shop for public domain books, I can't say enough good things about The Online Books Page from the University of Pennsylvania. It organizes and lists thousands of books (including Gutenberg content and now some Google Print content) from many digitization projects. It's amazing how much is added each day. Here's the "new listings" page that also offers an RSS feed.
Posted by Danny Sullivan at 1:48 AM | Permalink
Forthcoming "Amazon Pages" & "Amazon Upgrade" Programs To Sell Books In Electronic FormatAmazon.com, proprietors of the wonderful Search Inside the Book (SITB) program, has announced a new Amazon Pages program that builds on SITB to allow consumers to download specific pages, chapters, or the entire book for offline reading.
Elinor Mills reports more in her News.com story, Amazon, Random House throw book at Google:
Amazon's new "Amazon Pages" program will let people purchase online access to anywhere from a few pages of a book to an entire work. The e-commerce company also announced a program called "Amazon Upgrade" that will let customers pay extra to be able to access books electronically that they've had shipped to them in printed form. In what could be the first step toward major publishers offering their works online, Random House said it will negotiate separate agreements with online booksellers, search engines, entertainment portals and others to offer the contents of its books to consumers for online viewing on a pay-per-page basis.
Elinor's article also reports that Amazon.com will begin the program sometime in 2006. However, no specific date was given. Of course, digital books available for purchase are not a new idea but after this announcement, I would plan to see much more of it from many publishers, online book vendors, and databases. Of course, the issue of people wanting to read large portions of books in a non-traditional format is an issue. New technology could change this but many people want to hold the book in their hands. Of course, for reference use, online access can be very useful.
The Amazon press release also mentions that SITB is now available in the UK, Germany, France, Canada and Japan. By the way, I've posted about some of my favorite SITB features, here.
Posted by Gary Price at 1:47 AM | Permalink
Google Print Now Publishing Out-Of-Copyright Works Gained Through Library Scanning ProgramGoogle Print is now publishing the full-text of public domain/out-of-copyright print works it has acquired through the Google Print library scanning project. The official Google Blog provides more information and examples of finding some of this material in this post.
The move comes a couple of days after it was announced that Google was resuming its library scanning project, which includes works that are in and out of copyright. Works that are in copyright are not reprinted online without explicit publisher approval.
Google book scanning still on hold from News.com has more follow up on the resumption of scanning, as well as how well-known quotes from some classic books fail to bring up these books in Google Print's top results.
Remember, some of this "public domain" material might not be available outside the US since Google is using different dates to determine copyright around the globe. It's pre-1922 for the U.S.
The same blog post also points out (in a postscript) that others have been digitizing public domain materials for years. Heck, Project Gutenberg has been around since 1973. This post includes a few comments from its founder, Michael Hart.
If you're looking for a one-stop shop for public domain books, I can't say enough good things about The Online Books Page from the University of Pennsylvania. It organizes and lists thousands of books (including Gutenberg content and now some Google Print content) from many digitization projects. It's amazing how much is added each day. Here's the "new listings" page that also offers an RSS feed.
Happy reading!
Postscript: The British Library and MSN said the will work together to digitise around 100,000 out-of-copyright books for MSN Boook Search. More in the news release and this News.com story. About a week ago MSN joined the Open Content Alliance.
Postscript 2: Note to Google. I was reviewing a few public domain titles. How about a link to make the page larger? SITB offers this feature. Also, what about being able to access a page by page number after reviewing a table-of-contents or index?
Postscript 3: Let's not forget about other book digitization projects from Europe and a group of German publishers.
Posted by Danny Sullivan at 1:47 AM | Permalink
Many libraries throughout the world offer online access to their catalogs. The oddly-named RedLightGreen taps into thousands of these catalogs, allowing you to find books on any imaginable subject, and then do very interesting things with your search results. Gary Price offers a rundown of this alternative to Google Scholar in today's SearchDay article, Searching for Library Books with RedLightGreen.
Posted by Chris Sherman at 10:12 AM | Permalink
This Reuters article out of Germany: Publishers to build own online book network, comes as no real surprise. Why? Back in June, we blogged about a trade group of German publishers considering their own book digitization program. It looks like what was discussed several months ago is now a go.
The German association of book publishers is planning to build a network by next year that will allow the full texts of their books to be searched online by search engines but will not hand the texts over to these companies...In the longer term, the German association wants to build its own search engine to offer services which could rival those offered by Google, Yahoo or Lycos, and even offer readers the chance to borrow books online. "We don't want Google to hold the texts in its servers; we want the publishers to keep them," Matthias Ulmer, who is leading the project, told Reuters in an interview at this week's Frankfurt Book Fair.That's not to say that some German publishers are going to join the Google Print program. In fact, last week Google launched a Google Print "only" interface for Germany. They also introduced info Google Print for Germany a few months ago. This Deutsche Welle has info about Langenscheidt, a German dictionary publisher, going to begin with 160 dictionaries being added to Google Print.
Btw, all of this is not to be confused with a European program to digitize materials that we blogged about here.
Finally, let's not forget that Google is not the only organization making digitized books available on the web. NetLibraryand ebrary have been doing it for years and in many cases allow users to read the full text, in some cases print the text, and annotate what they read. I mentioned both of these services (available from home via many libraries) in my original post about the Google Library Program.
Posted by Gary Price at 10:24 PM | Permalink
No doubt that one of the hottest topics these days are full text books online. I'm not talking about the Google Print or Search Inside The book material from Amazon.com type of content but the myriad of content out there FOR free (often, but not always, public domain stuff) that can be both interesting reading and useful research content. One source I've mentioned time after time on the SEW Blog, SearchDay, and ResourceShelf (it's that deserving) is the Online Books Page (directory really) from John Mark Ockerbloom at the University of Pennsylviania and has been online since 1993. This well organized collection is updated almost daily with newly digitized books. It collects materials from disparate sources.
I'm mentioning the OBP once again because its "What's New" list (material that's just been added to the directory) now offers an RSS feed. It's absolutely amazing just how much material gets added to the collection on an almost daily basis. This is one feed many of you will want to add.
Posted by Gary Price at 6:59 PM | Permalink
The European Commission has officially announced its plans to develop and build a digital library. The News.com article: Europe aims to rival Google with digital library reports:
According to an EC announcement on Friday, the aim of the project is to digitize and preserve records of Europe's heritage--including books, film fragments, photographs, manuscripts, speeches and music--and make it available online to all European citizens. To make this happen, the European Union is proposing high-level cooperation between the member states and has set a deadline of Jan. 20, 2006, for first comments on the plans.Ah, cooperation. It's a good idea especially in the library digitization world. Of course, talk of cooperation and actual cooperation are not the same thing.
You've got to wonder if Brewster Kahle, David Mandelbrot and other members of the Open Content Alliance team are working to convince this large digital library initiative to become a member of the alliance. It would seem to make sense, especially for searchers.
You can learn more about i2010 Digital Libraries program on its home page. The official press release is announcing the program also available.
Postscript: A brief rant. I completely realize (aka not naive) that Google is Google and whatever they do gets most of the press attention. The word Google gets people's attention. However, digitization programs have been going on for years, long before Google was even around. Heck, Project Gutenberg was around before Larry and Sergey were even born. Why does the press seem to believe that every other project must "rival" Google Library (I know it sells papers, gets clicks, I'm not naieve). Yes, Google Library is a massive and important undertaking but turning into a contest or war seems to make little sense. Other digitization programs (some large and some very small) working to digitize important materials are also crucial. Let the content, the quality of the digitization, the ease of access, be what really matters.
Posted by Gary Price at 7:25 PM | Permalink
Project Gutenberg Founder on Digitization, ebooks and the OCA LaunchIn my story today about the new Open Content Alliance and in my story last December about Google Library, I make mention of Project Gutenberg, this book digitization project that creates "ebooks" has been going strong for almost 35 years (that's like a millennium in online info time). Here are some new stats and links about PG courtesy of Michael Hart.
Needless to say, Project Gutenberg's founder, Michael Hart, is a pioneer, if not THE pioneer when it comes to ebooks. He has lots to say on the topic. Below is an email, a commentary really, about the Online Content Alliance that he shared with me via email and allowed us the chance to reprint. Questions or comments can be sent to Michael at: Hart@pobox.com.
Yet another consortium of multi-billion dollar institutions has thrown its hat into the eBook/eLibrary ring today, just 9 months before the 35th Anniversary of Project Gutenberg's placement on the Internet of the first eLibrary element, on July 4th, 1971.Last December 14th Google used a multi-million dollar blitz of television, radio and print media to announce the Google Print revolution: "Today is the day the world changes," but so far it has been difficult to get even a handful of books from their project, some 10 months later.
I am wondering of the news media will give the same kind of coverage to a second such announcement, which will also put up an alliance of an Internet search engine giant with some multi-billion dollar libraries. I will be watching all the news programs tonight in eager anticipation, as I was doing last December, but I fear that "once burned/twice cautious" might take some of the wind out of their sails/sales.
However, this effort has one huge advantage: "The Internet Archive," run by my friend Brewster Kahle. Brewster is one person who has a proven ability to put an enormous resource on the Internet for the whole wide world to use.
This difference is such that I am willing to bet that Yahoo! gets off to a better start in the next 10 months than did a rather completely false start by Google.
Of course, the real test will be to see how long it takes a project such as this to reach a million eBooks, since there are already well over 100,000 eBooks already available free for the taking on various Internet sites, perhaps 50,000 of them from the various Project Gutenberg sites.
Here's a hope that a few years from now anyone can have the advantage of a million book home library, and in even a few years more to ten million books sitting on one inch of your own bookshelf next to your computer.
Michael S. Hart Founder Project Gutenberg
Posted by Gary Price at 4:48 PM | Permalink
A New Alternative to Google PrintThe Open Content Alliance is launching today, with plans to make thousands of books, multimedia files and other materials freely searchable and accessible and online. Unlike Google Print, however, anyone adding content to the Open Content Alliance must have permission from copyright holders.
Gary has more on the new initiative, including comments from one of Google Print's harshest critics, in A New Digital Library Alliance Makes its Debut.
Posted by Chris Sherman at 9:36 AM | Permalink
Courts Unlikely To Stop Google Book Copying from InternetWeek has legal experts saying that copyright law over indexing books appears to be on Google's side. Of less concern is whether Google actually gets permission to do copying. More weight is applied to the economic impact on the copyright holder and the amount of material used in proportion to the whole. But don't they use all the book? Yes, they scan all the book but they show little without explicit permission (for more, see our past Another Google Book Scanning Debate & Another Publisher Group Objects post). The InternetWeek article is a great look at some of the issues involved. My favorite part:
"If copyright law worked the way Google would like to see it working, then everyone in the world would be able to use the material unless the copyright holder explicitly told them not to, and even then it would be OK," says Allan Adler, the vice president for legal and government affairs for the Association of American Publishers. "That would be a very strange copyright system."
As I've said before, that's exactly how things currently work with web indexing. The Association Of American Publishers doesn't appear to have minded Google indexing nearly 800 pages from the site over the years without permission, all of which have copyright protection. But books apparently are different.
To be fair, books are different in the sense that most web sites don't earn money by selling their content. They typically earn by carrying ads. Book sellers do have legitimate fears that online book searching might lead to less sales -- and that appears a factor that will be key in any dispute. It would have to be proven.
But say I'm looking for a particular fact. I search for a book using Google Print. I find that there's a book that appears to match, but since the publisher hasn't given Google what I'd call "display" permission as opposed to "indexing" permission, I can't see the answer. Harm? Hard to show. Benefit? Easier to show. I didn't know this book might have an answer I needed. Now I do, and I might go get it.
One lawyers in the article makes exactly this argument in the latter part of the story, dealing with past case law that will likely be applied.
Here's the exception. What if I can see the answer? Look here. That illustrates how without explicit display permission, Google will show only a few lines or "snippets" of information. But if the answer I want is in the snippet, then it is easier to show harm. I no longer may need to buy the book. Imagine a book about computer game tricks and tips. If I can see the tip in the snippet, I may solve my problem and save my money.
One solution might be to completely eliminate snippet display for books without copyrighted permission. Some web sites can argue the same, that having snippets might mean people don't come to them, of course. But Google already provides a way for site owners to turn off snippets. That's an opt-out thing. Perhaps with Google Print, showing even snippets will need to be an opt-in situation.
Posted by Danny Sullivan at 9:26 AM | Permalink
Back in June, I blogged about Google finally offering an interface to search only Google Print material. Today, Google announced that they're continuing to offer new Google Print search tools with the release of country-specific interfaces to the Google Print database.
Interfaces for Google Print are now available for 14 "English-language" countries: UK Australia Canada India New Zealand Ireland South Africa Pakistan American Samoa Trinidad and Tobago Kenya Jamaica Mauritius Uganda
So, what's the difference between these new interfaces and simply going to the main interface? These new country-specific interfaces will, in many cases, include links to "local" online booksellers in addition to links to Amazon, B&N, etc. Local pricing is also provided. The underlying results remain the same independent of which interface to Google Print you use. Additionally, direct links to materials from the Google Print database will now be viewable in a one-box at the top of web results pages on each "local" domain site.
Posted by Gary Price at 5:27 PM | Permalink
Although the Google Library program has grabbed most of the headlines since it was first announced last December, I've tried to point out some of the other programs (many have been around for years) working to digitize the full text of books and other materials. Today, the MSNBC article, Turning books into bits, offers a look at what some of these projects are up to.
Here are some other SEW Blog posts:
Unfortunately, the article doesn't include any mention of some of the impressive digital book sevices provided by ebrary and NetLibrary that are also accessible online.
You'll also read about OCLC's Open Worldcat program that provides bibliographic records for millions of items and makes them accessible to Yahoo and Google's crawlers. Useful? Absolutely! However, like I said last week, simply making material crawlable doesn't mean it will be easily accessible. In other words, there are big differences between crawlability and accessibility and visibility. Every great result can't be in the first five of six links on a results page. I think this is one of the many reasons why we're seeing a growing interest in vertical search tools.
If you're looking for a wonderful vertical that provides easy access to over 120 million bibliographic records and then allows you to click and find out if your library has a copy (it also even format your bibliography), make sure to check out RedLightGreen. Here's an overview about the database that I posted a few months ago.
Final Comments + Like I've said so many times, remember that the world of the library and librarian now exist beyond the four walls of the library building. My guest column for BetaNews offers a look at some of what's available. Btw, interest in librarianship these days is booming. The library school I attended has it's highest enrollment ever.
+ Carol Brey-Casiano, the current president of the American Library Association, is correct with what she has to say in the article. Librarians are trained researchers and increasingly becoming key teachers of online research skills. She points out that 50 percent of the questions her own public library in El Paso receives "are about Internet research: how to narrow their search, whether a resource is reliable." Critical information skills might be more important these days that ever before.
Those of us who "watch" and "live" search often forget that make little to no use of what an engine can provide in terms of creating a better search. As engines grow larger with more content and also offer new services, it's going to become even more important that basic and even advanced search and retrieval skills are something that the general public understands. At this point simply knowing that services other than the basic web search are available, would be a start.
Posted by Gary Price at 10:17 AM | Permalink
Just in, news that the agreement between Google and the University Of Michigan for the Google library digitization program has now been posted online. Until now, no details of agreements Google has between libraries have been published.
Michigan Digitization Project is the university hosted page about activities there. The agreement is now listed on that page. You can find it directly here (PDF format). I haven't yet read through the agreement, but we'll do a summary shortly.
FYI, this looks to be the result of a request that Google Watch's Daniel Brandt made. He comments about the request in this thread at our SEW Forums. I've also started a new thread for discussion of the agreement. Please comment there: Google Library Agreement With University Of Michigan Published.
Postscript: Google?s Library Digitization Project: Reports from Michigan and Oxford is a presentation from the University Of Michigan and Oxford University on progress with the program to date.
Postscript 2: See also Google Library: Peril for Publishers? from InternetNews that touches on some copyright issues.
Posted by Danny Sullivan at 10:08 PM | Permalink
AccessMyLibrary.com Puts Library-Only Content On The WebAccessMyLibrary.com is a new site from Thomson Gale that has made content the publisher provides to library now accessible through the web. Searchers can go to the site directly, and if they have a US library card, access information behind password walls. Google and Yahoo are also apparently indexing abstract pages, so that relevant content may appear within regular search results.
Library Materials Given to Search Engines from the AP takes a closer look at the service, which was announced today.
Libraries offering free remote access to databases (think of them as vertical search engines) containing full text material (articles, reference info, etc.) from thousand of publishers is not new. As many of you know, I've been posting about this for a long time here on the SEW Blog and on my ResourceShelf site. In 2003, I wrote an article for SearchDay on this topic. Last week, I wrote a guest column for BetaNews that provides an overview about what you can find.
Needless to say, today's announcement is very exciting news. Hopefully, more people will become hip to the fact that many library services, not only databases, are accessible without having to go to the library building.
So, how will also of this work?
Remember, material is just starting to enter the Yahoo and Google databases. As Liedtke notes,
The search engines began scanning the Thomson Gale data Thursday, but it could be awhile before the material starts to emerge in search results.
Until then, these tips:
A final caveat for now. Thomson Gale produces some great databases full of wonderful content from top publishers but they're not the only database provider out there.
For example, someone with a San Francisco Public Library Card can also access materials (full text books) and full text articles from many other database vendors including ProQuest, EBSCO, and even the Oxford English Dictionary.
Caveat aside, I think this quote from Clara Bohrer, president of the Public Library Association says it best,
"It's a real positive step," Bohrer said. "Most libraries just haven't been able to get the word out about all the wonderful resources that they have online. Hopefully, people will start finding more information through these searches and say, 'Gee, maybe I better go check out my local library's Web site and to see what else I can find there.' "
Posted by Gary Price at 7:11 PM | Permalink
Yahoo Search Subscriptions Brings Premium Content Into Web SearchYahoo has released a new Yahoo Search Subscriptions (beta) service that unites regular web search results found from crawling the open web with listings from free and fee-based database services and publishers such as Factiva, LexisNexis, and Consumer Reports.
These databases have content typically "invisible" to web crawlers. The move should help many people who assume the open web has all the research material they need discover additional content they'd otherwise miss.
To view the full text of premium content, searchers will either have to have a subscription to the fee-based database providing it or take advantage of pay-per-article options, when offered.
Content Partners
What new material is being added? It runs the gamut from news to some "scholarly" content. At launch, sources include:
In the coming weeks, additional content will include:
Yahoo said it plans to add many other content providers in the future, as well. The service is initially available in the U.S. and the UK.
Using The Service
To search subscription content, searchers need to visit the Yahoo Search Subscriptions page and check each subscription source they'd like included in their search. Once selected, you can then either search just against those sources (the Search Subscriptions button) or search the entire web (the Search the Web button) and have subscription content also displayed.
The settings you choose work on a one-time basis and only for that page. Select sources, do a search, then go back to that page and you need to pick your sources again.
To avoid this, use the Yahoo Search preferences page to permanently select subscription sources. Once saved, any search you do will always check these -- and whether you search from the subscription page or just the Yahoo home page. You can also remove one or all of these sources by returning to the preferences page and deselecting them.
After you search, subscription listings will be shown above web results, as highlighted below:
Be aware that Yahoo also said subscription content may also be mixed into web results, despite the aforementioned segregation.
Anyone can see listings from any of the subscription sources. However, you won't be able to clickthrough and read the full-text of articles without having a subscription or paying a per-view fee.
Yahoo Subscriptions Versus Google Scholar
I'm sure Yahoo's new service will draw comparisons with Google Scholar. However, at this time, most of the Yahoo material (with the exceptions of ACM and IEEE) appears to be current events, news, and business oriented rather than scholarly or peer-reviewed.
It's also interesting to see Yahoo work with not only publishers but also with content aggregators like Factiva and LexisNexis. That said, I'm sure Google Scholar will be offering access to more of this type of content in the future and very likely has deals with some of the same aggregators and publishers that Yahoo does. Likewise, I wouldn't at all be surprised to see more peer-reviewed/scholarly material in Yahoo. Yes, competition is a good thing for the searcher.
One thing I'll want to watch closely is how Yahoo handles content that might be accessible for free via Yahoo News and for a fee via one of the premium services. Also, since Factiva and LexisNexis have massive archives of content, in some cases back more than 20 or 30 years, it will be important to spend some time determining what is and is not available. For example, will I be able to access a 1993 Washington Post article via LexisNexis?
Remembering Northern Light
By the way, this is not the first time Yahoo has offered a "gateway" of sorts to fee-based content. In 2002, they announced a deal with Northern Light to provide access to their "premium collection."
Northern Light itself started out doing exactly what Yahoo Subscription Search offers, a combination web search and premium database search combined. But the company didn't earn enough to keep growing. It closed its web search service in 2002.
Looking Forward & Wish List
Let's keep our fingers crossed and hope Yahoo decides to take all of this a step further and work with database providers so that the premium content that many people have access to FOR FREE via their local public, university, or corporate library (aka institutional subscriptions) becomes more easily accessible for these people.
If you're unclear about what I'm talking about, many public libraries offer free access (for personal use) to fee-based databases from your home or office, such as I covered here recently. Google Scholar has already made some strong inroads in this area. Remember, these days the world of the library and librarian extends beyond the four walls of the library building.
Three more things Yahoo should work towards.
Postscript 2 (from Gary): In the post I mentioned that libraries offer free access to lots of databases from many providers. Just in, news that one of the companies that Yahoo will be providing some content from, Thomson Gale, has just announced an early beta that will make accessing TG content discovered via Yahoo (and Google) and available for free from libraries, even easier. BTW, once more content becomes available via Yahoo Subscriptions, I plan to post a follow-up.
Want to discuss? Visit our forum thread, Yahoo Subscription Search Service Opens.
Posted by Gary Price at 12:00 AM | Permalink
I'm always mentioning the specialized databases often containing the full text of articles, reference books, etc. that libraries make available for free.
These tools (think of them as powerful vertical engines) can be accessed from your home or office, 24x7x365. If you're interested in learning more, I've written a guest column for BetaNews that provides and overview about some of what's out there. Yes, in the web age, the world of the library and the skills of the librarian extend well beyond the four walls of the traditional library building.
Posted by Gary Price at 11:55 AM | Permalink
Since Google announced their book digitization plans we've read about a major book digitization project across Europe being lead by France and recently learned about some U.S. publishers having "issues" (at least as of now) with Google's library digitization program.
Today, the International Herald Tribune reports in the article: German publishers' Google challenge, that another large book digitization project is in the works.
...five-member task force of the German book trade association Börsenverein are organizing their own digital indexing project, Volltextsuche Online...The German project includes some publishing industry heavyweights like Verlagsgruppe Georg von Holtzbrinck, a Stuttgart-based media group. But it still faces a test of membership reaction at a general assembly of the association on June 17 in Berlin.According to the article, the German project is not about digitizing out-of-copyright materials but rather about providing access to titles that still have a valid copyright.
Remember, Amazon's "Search Inside the Book" program which went live months before Google Print is also working to provide searchable access to limited amounts (as determined by the publisher) of in-copyright books.
In this recent blog post I do my best to explain the differences between Google Print (limited access to in-copyright material direct from publishers) and the Google Library (full text of public domain material, limited content from in-copyright material).
It's also worth pointing out that Google is using different baseline dates for copywritten vs. public domain material inside and outside the U.S.
From the FAQ: If you're in the US, we've taken a very conservative stance and only books pre-1923 will be considered public domain. If you're not in the US, only books pre-1900s can be considered public domain because of differing copyright laws internationally.
In two posts, (here and here) I point out that MANY other book digitization programs are going on both from disparate organizations( The Internet Archive, Project Gutenberg, numerous libraries, etc.) and from individual publishers (National Academies of Science, thousands of full text books, impressive search options) are also moving forward.
Finally, looking to find directories of some of the full-text books online, many that are free? Take a look at these directories.
Postscript: Two fee-based databases that offer the full text (searchable, now limit on how much you can read, download for offline reading) from thousands of technology books from O'Reilly, McGraw-Hill, SAMS, and other publishers that I haven't mention in the past are: + Safari Books Online A free 14 day trial is available. Also, many libraries offer this service for free without having to visit the library.
+ Books24x7 Individual subscriptions available as well as remote access via many libraries.
Posted by Gary Price at 1:33 PM | Permalink
Michelle Slatalla, the "online shopper" from the New York Times, has written an article about several databases that can help you find books available for purchase.
I'm glad to see RLG's RedLightGreen is mentioned. About three months ago, I posted this overview of RedLightGreen, a database with bibliographic informaton for more than 120 million books.
Posted by Gary Price at 11:57 AM | Permalink
We figured that this was going to happen. Google has just released a Google Print "only" interface at http://print.google.com.
Although an advanced interface is unavailable, you can limit to words in the title of the book by using intitle: The allintitle syntax doesn't seem to work.
It would be useful if options to limit by author, publisher, and language were available. Also, the ability to click on an author's name to view all of the titles in Google Print could save the searcher time.
I also tried searching the Google Print "only" database with an ISBN but it didn't appear to work.
Remember, with most Google Print and material from Amazon's "Search Inside the Book," you can only view a selected number of pages over a 30 day period as determined by the publisher. You're also unable to annotate or print (yes, screen caps are possible) most of this material.
Google should consider offering a limit to search only material that's in the public domain, where the full text can be read online with no restrictions. They might also want to consider linking to public domain books on the open web. Of course, this might not make some publishers happy. Google Print is as much an online research tool as it is a venue to sell content. In some cases, a book might have copywritten editions available but also have the text in the public domain. Roy Tennant pointed this out on ResourceShelf earlier this week.
I asked a Google spokesperson how many books Google Print currently contains and the rate new books added. I asked but didn't receive answer. The company spokesperson would not reveal these numbers.
Btw, stopwords are also used with the Google Print database. So, if you want to find books about The Who, make sure to use quotes.
Even though Amazon.com doesn't offer a "Search Inside the Book only" full text search interface (they should), they do offer plenty of search options to find specific titles and authors. You can also search by subject using the subject headings assigned to each title. In other words, Amazon offers more robust searching of bibliographic info. After finding the title, you can search the full text of a book. "Search Inside the Book" also allows the reader to enlarge a page on their computer screen. Google Print doesn't.
Earlier this week I posted an overview of Google Print/Google Library and attempted to explain the differences. I also discussed other sources for full text books online including NetLibrary (available from many libraries) and the Online Books Page at the University of Pennsylvania.
Finally, here's an unscientific comparison test I did trying to see what Google Print and Amazon's "Search Inside the Book" offer.
Posted by Gary Price at 6:46 PM | Permalink
Google and Amazon both offer "search inside the book" features that allow you to explore the contents of printed books. Unfortunately, they both have limitations that prevent you from annotating, copying or printing content. And there's no explicit way to limit your searches just to printed book content with either service.
That didn't stop our intrepid news search editor from finding ways around these limitations. In today's SearchDay article, Going Under Cover with Book Search Tools, Gary shows you how to get the most out of Google Print and Amazon's Search Inside the Book programs, and provides a tour of a number of other full-text book services that don't have the limits of the biggies.
Posted by Chris Sherman at 7:53 AM | Permalink
Europe rallies against Google library from AFP notes that 19 major European libraries are backing a plan to put European books online, in reaction to Google's project to digitize books. The move, said the president of France's national library, is to prevent "the risk of a crushing American domination in the definition of how future generations conceive the world."
Darn that Moscow-born Sergey Brin, cofounder of Google and European native! What was he thinking, partnering with libraries in the United States that have no European books.
Apparently, Oxford University in the United Kingdom has no European books either. It's so odd, because I live in the UK -- and generally people consider it part of Europe. How did Oxford manage to get by for all those centuries without European works in the library?
Sarcasm aside, I can understand the fears. And to be fair, the concern is about which books will be selected, not that there won't be European books at all. But the fearmongering to rush into a project, in reaction to something Google's only just started and which hasn't yet even been shown to shovel McGoogleBooks down the throats of everyone? It's extreme.
The most disturbing thing to me about Google's project is that there have already been other projects that it isn't coordinating with. Now we may have another project backed by Europe, posing even more duplication. Get it together, folks -- perhaps a little detente is needed so that the entire world is better served.
Need some more acrimony? France Detects a Cultural Threat in Google earlier this month from the New York Times has more on the concerns and the France-backed move for a European project, well worth a read.
And spotted via The Unofficial Google Weblog, the head of UK publisher Bloomsbury warned those in his industry against the Google Print program. The concern this time was not cultural heritage but that of pocketbooks. Google Print will make book swapping like music sharing.
Posted by Danny Sullivan at 1:11 PM | Permalink
Those of you who blog might be interested in a new tool that HighBeam research released today.
The HighBeam Research Blog Enhancer allows any blogger to quickly and easily add citations or provide access to the full text of articles from the HighBeam database to their blog. HighBeam currently provides full text material from more that 3000 sources
The ability to link to citations or offer the full text of articles is determined by your membership status with HighBeam. A basic membership (free) allows the blogger to post citations to articles along with notes about the article the blogger might have. Someone reading your blog will need to have a paid HighBeam membership to read the full text.
A fee-based HighBeam "full" membership ($9.95/month) allows the blogger to provide full text access to articles found in the HighBeam database. More info about the new service here.
Those of you doing "personal" research (and not wanting to share content on the web) might recall (mentioned on the blog in the past) that many public libraries and most university libraries, offer free web-based remote access (no need to visit the library, available 24x7x365) to full text databases with magazine articles and reference info. All you need is a library card for the library. Heck, in Michigan you don't even need a library card, you can login with your drivers license number. One database that I have access to here in DC, offers full text articles from 6000 sources. Another, provides the full text and full image of every New York Times article ever published back to 1851. I even have access to a searchable version of the Oxford English Dictionary. That's what I have access to. Here's an example of what someone with a San Francisco Public Library card has access to. Wow!!! More about all of this in an article I wrote for SearchDay.
Posted by Gary Price at 3:28 PM | Permalink
For several weeks I've noticed that using the trigger word "books" or "book" in a Google search always included a OneBox with results from the Google Print program. Today, P.L. points out that Google has now made this search option official with the addition of it to the Google Web Search Features page. Here are a few examples:
+ Books Dallas Cowboys + Books Geoege Bush + Books Google
I was unable (in my test, your results may vary) to find a Google OneBox that offered more than three book titles. As you also know, each publisher decides how much text from each book a user can view duing a search session.
For the fun of it, I conducted a few random searches to see if some Google Print material was also available from Amazon's Search Inside the Book. I also found a few items from the Amazon.com database and then did a search for them at Google.
Google Search: Books iPod The Ipod Companion Also SITB at Amazon.com
Ipod and Itunes Hacks No Amazon
How to Do Everything With Your Ipod Also SITB at Amazon.com
Google Search: Books Chicago + Chicago's Mansions Also SITB at Amazon.com
+ History of the Development of Building Also SITB at Amazon.com
+ Chicago Blues - by Mike Rowe Also SITB at Amazon.com
Google Search: Books Pope John Paul II
+ Pope John Paul II Also SITB at Amazon.com
+ Witness to Hope Also SITB at Amazon.com
+ John Paul II Also SITB at Amazon.com
Google Search Books Science Fair Projects
+ The Complete Idiot's Guide to Science Fair Also SITB at Amazon.com
+ Guides to Collection Development for Children No Amazon.com
+ Resources for Teaching Middle School Science "Look Inside the Book" only
Amazon.com (all titles available via Search Inside the Book) ------- + The Ultimate Montana Atlas and Travel Encyclopedia, 2nd Ed. Not found via Google Print
+ The Rolling Stone Encyclopedia of Rock & Roll (Revised and Updated for the 21st Century) Not found via Google Print
+ The Extreme Searcher's Internet Handbook : A Guide for the Serious Searcher Not found via Google Print
Posted by Gary Price at 11:45 AM | Permalink
Search verticals are hot so LookSmart launched five new vertical sites today aimed at different user groups. Each offers access to FindArticles material along with a interface to search the LookSmart web index.
+ Teenja (For Teens) + GradeWinner (For Tweens) + 24Hourscholar (For the College Student) + ParentSurf (For Parents) + GoBelle.com (For Moms on the Go) I guess if you're a mom (who's on the go) and is also a college student, you could use three of these sites. (-:
Each vertical lists a human editor who selects articles from the FindArticles. However, I was unable to find what criteria these editors use to determine what is and isn't material worthy of inclusion into each vertical.
I wondered if results from the LookSmart web index would be targeted for each user group. I ran the same web search at 24hrscholar and Teenja and got the same results. Finally, I found adult material at Teenja and GradeWinner but was unable to find any type of filter to remove (or try to remove) adult content. This could make use of these sites in the K-12 community an issue.
More about the new LookSmart verticals in this news release. Thanks to S.C. for the news tip.
I've said many times before that it's really sad when students and the general public for that matter don't know that thousands of public libraries offer full text and free access to thousands of sources (newspapers, magazines, full text reference books that are accessible with a library card WITHOUT having to visit the library.
Here's an idea of what I'm talking about from a library in my area. Wow! Again, every library offers different services but overall, this is high quality material including many tools aimed at students. Of course, all universities also provide access to many full text databases and reference tools accessible from dorm, home, office, etc.
For more on free library databases, take a look at this SearchDay article.
Posted by Gary Price at 12:30 PM | Permalink
Harvard-Google Project Faces Copyright Woes from the Harvard Crimson looks at copyright issues being raised about Google's plans to digitize thousands of books in the Harvard University Library. Points of views from both sides, though I have to say this quote from a past library directory made my eyebrows raise: "Copyright laws are written for companies like Time Warner and Disney instead of research libraries like Harvard. [These laws are] not aimed at us." Actually, I thought copyright laws were written to protect the rights of publishers from anyone, be it Time Warner, Harvard or a public corporation like Google.
Posted by Danny Sullivan at 3:34 PM | Permalink
FindArticles has always been a great resource for locating periodical content that might not be available on the web in general. Now the LookSmart-owned service has redesigned. I actually don't like the new look-and-feel, which seems busier than before. But new options appear to be the ability to see results grouped by topic, to limit results to only content free from pay-per-view charges (more on that premium content here), additional sorting options (date, relevancy, publication) and ability to search within specific categories, such as sports or news.
Posted by Danny Sullivan at 12:40 PM | Permalink
A month after we posted on the head of the France's national library (Bibliothèque nationale de France) being concerned with Google's ambitious program to digitize library materials from several large libraries "favouring Anglo-Saxon ideas and the English language." Whatever. To think that large university libraries that Google is working with don't have thousands of resources (books) that discuss a variety of topics from a variety of viewpoints is silly, IMHO.
Today we're reading about President Jacques Chirac of France calling for a European project to digitize library books. Here's a translation of Chirac's statement.
That said, it's not that Chirac's idea is a bad one. The more materials that are digitized and made easily findable (searchability and findability are separate issues) the better. However, it would be great if large digitization projects could cooperate to save time, money, and avoid duplication.
Finally, this article makes me a bit upset. Why? It and many others like it make seem like Google is the only organization digitizing books in one form or another. Note to journalists, they're not!
What about the impressive initiative from The Internet Archive? What about Project Gutenberg? What about smaller but nonetheless important digitization projects from libraries located around the world? What about organizations like ebrary and NetLibrary? Heck, what about publishers like The National Academies Press that provide free full text access to all of their publications? Btw, for a few of my favorite directories to find some of the many books that have been digitized, take a look at this post.
UPDATE: While we're on the topic of Google Print and libraries, Tara points us to this Harvard Crimson article: Harvard-Google Project Faces Copyright Woes. I can say that this article isn't all that surprising. Also, this article by library legend Roy Tennant might be of interest. Roy talks about how many librarians at Stanford have very few details about how the Google Print program will work.
Posted by Gary Price at 11:16 AM | Permalink
No, it's not a typo and I'm not going to be talking about stoplights.
When blogging the Google Scholar news I realized that since I started working with Danny and Chris on the SEW Blog, I never mentioned another large (and I think very useful) database to find and access library books called RedLightGreen. It's easy to use and includes plenty of features. We've been blogging about it on ResourceShelf since October 2003 when this impressive tool was first launched.
What Is It? RedLightGreen is being developed by RLG, a library organization based in California. This database is what librarians call a "union catalog" and contains bibliographic information for more than 120 million books. Is that it? Hardly! RedLightGreen offers much more. Btw, RedLightGreen is free to access and use.
Plenty of Features Here's a search for "Internet History".
An advanced search interface with a few more search options is also available.
Results pages not only contain a list of "hits" but lots of help to help you narrow and focus your results. Look in the column on the left side of the page. Here you'll find:
+ Options to focus your search to a specific subject. Where do these subject links come from? They're from the subject headings that have been assigned to the books in your results list. They've been assigned to the books by human catalogers from a controlled vocabulary called Library of Congress Subject Headings. Taking advantage of subject headings/descriptors can often allow you to find just the right material(s) very quickly. + You'll also see a clickable list of all of the authors in the results list. + Finally, if you want to limit your search to language in a certain language, it's only a click away.
Entry Pages O.K., I have a list of books, now what? Let's look at a page for a specific title. Two things to note here:
+ The green box in the upper right side of the page labeled, "Get it at your library." If you click this link, you can quickly check if a specific library holds the item with just one click. In fact, RLG just added direct links to THOUSANDS MORE local library catalogs around world. If you register (free and fast), the library you select (it's still easy to check others) will always be linked and listed.
+ Another green box on the right side of the page provides links to not only create a list of saved items but also, get this, format the list into one of four bibliography formats It's easy to then send the bibliography via email or print it. Cool!
+ Results pages also contain a link to find the item (if available) in the Amazon.com database along links from Google.
+ Easy Access Last week, RedLightGreen announced that a plugin for the Firefox toolbar was available.
We've only scratched the surface on what's available and I hope to write more about the project soon. RedLightGreen is more than worthy or your attention. You can read more about it here and here.
Posted by Gary Price at 12:31 PM | Permalink
Google book plan sparks French war of words from Reuters notes that the head of France's national library (Bibliothèque nationale de France) is concerned with Google's plan to digitize library books. Why?
Jean-Noel Jeanneney, who heads France's national library and is a noted historian, says Google's choice of works is likely to favour Anglo-Saxon ideas and the English language.He wants the European Union to balance this with its own programme and its own Internet search engines.
"It is not a question of despising Anglo-Saxon views ... It is just that in the simple act of making a choice, you impose a certain view of things," Jeanneney told Reuters in a telephone interview on Friday.
As we pointed out in a post earlier this week, some details about Google's plans to digitize library books are still not fully known. However, in the millions of volumes from the outstanding library collections that Google plans to digitize, many viewpoints in many languages will be included. Btw, It's also worth mentioning that many other book digitization programs are working to make material available. I'm sure more projects from France and elsewhere would also be welcomed.
Posted by Gary Price at 12:24 PM | Permalink
A couple of articles and notes about Google's massive digitization project with libraries and Google Scholar to post here. Both come from the library and librarian community. Btw, in case you don't already know, your SEW News Editor is also a librarian.
Roy Tennant (a living legend in my profession) writes about how Google has been not very forthcoming with details about their digitizaton project with the participating libraries. He also discusses the large amount of copyright research that Google faces in the Library Journal article, Google Out of Print.
A recent blog entry by Elizabeth Edwards, a Stanford University Libraries staff member, is particularly enlightening. According to Edwards, who was briefed on the Stanford-Google plan along with other staffers at a January meeting, "the company has not yet been forthcoming as to how the process of digitization will be implemented in detail; however, Google's process is characterized as 'industrial-strength digitization.'" Characterized by whom and with what evidence is unknown or unstated. Edwards further states that "Google is being 'coy' about standards and specs; minimums have been given but little to no fixed specs." It is difficult to judge the potential effectiveness of a project that provides no details.From day one I've taken a wait and see attitude about this project. It sounds great in print (no pun intended) but execution (digitizing the material) and then making it findable and usable, especially when the company told me they had no plans to offer a Google Print interface is another story. I could go on but I'll save that for another time.
I was't surprised about Google offering minimal details about how this was all going to work during the initial press announcement/hype period. However, I'll admit to being surprised that Google to this point hasn't offered much info to the Stanford Library. Bottom Line: I guess we're ALL still waiting and watching.
On the Google Scholar front, CrossRef (a project to make citation linking easier) and the 35 publishers participating in the CrossRef/Google Pilot (a review of the service here) met to discuss the pilot and Google Scholar. A post in the CrossRef newsletter gives us some details about Google and CrossRef talked about in the January meeting.
+ Google agreed with the principle that if there are multiple versions of an article shown in the Google Scholar search results, the first link will be to the publisher's authoritative copy. + Google would like to use the DOI as the primary means to link to an article.Finally, the newsletter describes Google Scholar as, "..a very broad search of all the web and includes any material that 'looks scholarly.'" It's a good description but the "looks scholarly" part made me smile. Why? I can't figure out how: + press releases + Someones resume (last item) and here (last item) + Government contract bid announcements
and other types of material "look scholarly" using even the broadest definition.
Many thanks to DD for the news tip.
Posted by Gary Price at 11:11 AM | Permalink
Although Google's recent announcement to digitize the contents of several large libraries got most of the press coverage, I'm glad to see that other book digitization projects (many that have been around for years) are now also getting some press attention. We mentioned several of them in our first story about Google Print library project and in this post a couple of days later.
A new article from the Hackensack Record offers more details about The Universal Library (from Carnegie Mellon University, The Internet Archive and others) and Project Gutenberg.
"Our objective is to ultimately take the works of man... digitize it and make it free to everybody," said Michael Shamos, a computer science professor at Carnegie Mellon University in Pittsburgh, which created the Universal Library.
Google Print, for example, will provide only a few lines of works published since 1923, because they are still under copyright. But the Universal Library displays the entire text of some books still under copyright.
The Universal Library will try to meet with Google to suggest cooperation. For starters, he said, Google can save itself the trouble of scanning the 100,000 books that are already part of the Universal Library's collection.
"To the extent that our books are free, then it would seem to be a waste of Google's time to redigitize those," [Michael] Shamos, [a computer science professor at Carnegie Mellon University] said. "They ought to go digitize other ones that we haven't gotten to -- or that we might never get to -- because they have a lot more funding than we do."
If you're interested in making use of full text books currently available online (most free) the blog post: Searching for Digital Books, offers links to several excellent databases.
Posted by Gary Price at 8:33 AM | Permalink
Word from the AP that the complete archive (both text and images) of the Boston Globe will be digitized and made full text searchable by ProQuest, a well-known database publisher. The Globe will be the seventh paper that ProQuest has digitized as part of their Historical Newspaper program. The other papers are:
+ The New York Times- 1851-2001 + The Wall Street Journal- 1889-1987 + The Washington Post- 1877-1988 + The Christian Science Monitor- 1908-1991 + Los Angeles Times- 1881-1984 + Chicago Tribune- 1849-forward
The Times of London archive has been digitized by Gale back to 1785.
The NY Times provides free web access to their database here. Articles cost $2.95. The searchability of the database is limited, especially compared to what you can do when using the ProQuest interface.
Many libraries (public, university, etc.) offer free 24x7 web access (including the complete articles and ads) to these and many other full text databases from home, office, or wherever you can get web access. In fact, yesterday I was just searching the NY Times and Wall Street Journal historical databases for early mentions of info retrieval companies. These databases are highly addictive.
More about what libraries offer for free and without having to visit the library building in this SearchDay article.
Posted by Gary Price at 12:06 PM | Permalink
Here's a deep/invisible web database that might be of interest if you're a collector National Geographic magazines (and other NGS publications) but don't have a tool to quickly find out the contents of each issue.
The National Geographic Publications Index provides indexing (including subject descriptors) for every story, map, etc. contained in most NG publications back to 1887.
It's also possible and very easy to limit your search to a specific publication. Just follow the instructions on this page.
Access to the National Geographic Publications Index is free.
Posted by Gary Price at 12:41 PM | Permalink
In our article about Google's library digitization project last week and this SEW Blog post, we noted another large library digitization project from the Internet Archive. Today, the Information World Review article: Internet Archive to build alternative to Google has more details.
10 large libraries are taking part in the project including the Carnegie Mellon University Library, The Library of Congress, and the University of Toronto Library.
In a statement, the Internet Archive describes the Text Archive as an Open Access archive that will "ensure permanent and public access to our published heritage". Over a million books have been committed to the Text Archive by the member institutes, with 50,000 available in the first quarter of 2005.
The Text Archive home page is located here. You can even view a movie of their book scanning machine at the University of Toronto here.
Posted by Gary Price at 9:33 AM | Permalink
Today, Time magazine released their complete archive (full text of all article back to 1923) on the web.
Access to full text articles is free for Time subscribers. The archive is located at: http://www.time.com/time/magazine/archives
Implied "and" between terms, phrase searching with "" marks, NO "OR" searching available.
An advanced interface is available and offers the following limits: + All stories or only cover stories + Date or date range + Sections of the magazine + Article length
The archive also allows users to browse/search a database of Time's covers You don't need to be a Time subscriber to view this material.
Posted by Gary Price at 5:57 PM | Permalink
In the SearchDay article about Google working with libraries, I mentioned Project Gutenberg. This service has been offering eBooks since 1971. Its founder, Michael Hart, just sent over some updated numbers about what Project Gutenberg offers.
+ 15,000 eBooks on the "original" site + 25,000 additional eBooks on the new PG site He also points out that Project Gutenbergs in Europe, Canada and Portugal are working with material over 50 languages.
Another project that I mentioned in my SearchDay article, is the Million Book Project from The Internet Archive.
What is it? ...the goal of The Million Book Project is to digitize a million books by 2005. The task will be accomplished by scanning the books and indexing their full text with OCR technology. The undertaking will create a free-to-read, searchable digital library the approximate size of the combined libraries at Carnegie Mellon University, and one much bigger than the holdings of any high school library.
Brewster Kahle recently shared some news about how the MBP project is progressing.
+ "The Million Books Project will be posting 10's of thousands of books later this month on the Archive and elsewhere. These were scanned in India by the Indian government."
+ This fall we kicked off a volume book scanning "in-library" project at the University of Toronto.
+ Watch a Movie of the Internet Archive's Scanning Robot
Posted by Gary Price at 1:15 PM | Permalink
Via the Yahoo Search Blog, news that a special version of the Yahoo Toolbar now allows you to search across two million library holdings through a cooperative project with OCLC, the Online Computer Library Center.
Searching library records is a favorite topic of Gary's, and he's covered this in past posts such More Full Text Books and The Virtual Reference. You can expect him to revisit this new Yahoo initiative when he's back from the Internet Librarian conference, where the partnership was also unveiled.
Postscript: Gary's sent me a few links
Posted by Danny Sullivan at 6:51 AM | Permalink
New Google Service May Strain Old Ties in Bookselling from the New York Times today has some nice quotes on how publishers are reacting to the expansion of the Google Print service.
Here's something I haven't yet seen (if you have, tell me). How are authors going to react? After all, publishers are now set to earn revenue off AdSense ads that appear in these books. But the books are the works for the respective authors. Shouldn't they get a share? Do contracts cover this? Will they have to going forward?
Posted by Danny Sullivan at 11:58 AM | Permalink | Comments (0)
In my article on Google Print yesterday, I noted in the version for SEW members that electronic copies of books aren't being accepted at this time. Tara Calishain highlights that this means plenty of digital copies in libraries won't yet be included: Google Print, Google Print, Argh Argh Argh.
As for Project Ocean, a rumored plan by Google to digitize the Stanford Library, Tara says that Google itself still has no comment on such speculation.
Barbara Quint also touches on the issue of picking up on electronic copies in her write-up: Google Print Expands Access to Books with Digitization Offer to All Publishers.
Meanwhile, Gary checked in with Amazon yesterday. They say the Search Inside The Book service has 100,000 titles at the moment, but they won't say how many they add each month. For more on Search Inside The Book, see the end of my Google Print Opens Widely To Publishers article.
In case you missed it, Gary also blogged some tips on searching the full text of books outside of Google and Amazon: More Full Text Books, plus how libraries can get you access indirectly to such material: The Virtual Reference Service.
Finally, want to see how Google Print results now appear in OneBox display? Visit Dirson's Example Of The New Google Print.
Posted by Danny Sullivan at 7:54 AM | Permalink | Comments (0)
Google Print is moving into a new chapter with the launch of a new program to gather up content from publishers. I've posted a rundown on the changes in today's SearchDay: Google Print Opens Widely To Publishers. Google's also posted new information on the Google Print site itself.
Posted by Danny Sullivan at 9:07 AM | Permalink | Comments (0)
Google Print opened at the end of last year, allowing you ability to find content from books and magazines that have been scanned and added to its database. Chris wrote about the service last year, and you can learn more here: Google Introduces Book Searches.
The problem is, what if you want to search for just this type of material independent of web content? Visiting Google Print doesn't help -- you just get a "This space intentionally left blank" message. The Google Print FAQ is no help. And the page that lists all of Google's various specialty search services? No luck.
Enter Tara Calishain. She's just posted a form that Google itself ought to offer, letting you search for just magazine articles, book references or both: Isolating Google's Printed Material in a Google Search Form.
Posted by Danny Sullivan at 7:18 AM | Permalink | Comments (0)
LookSmart will now offer more than 1 million premium content articles through its FindArticles.com web site. The content is provided by HighBeam.com. Reading these articles will still cost money, just as is the case at the HighBeam site itself. However, searching and reading abstracts is free. A bit more on this from EContent: HighBeam Research Partners With LookSmart.
Posted by Danny Sullivan at 4:13 PM | Permalink | Comments (0)