Forget Google Print Copyright Infringement; Search Engines Already Infringe

Gary blogged earlier about the Association of American University Presses having concerns that Google
Print’s digital library program may be equal to widescale copyright infringement. But that complaint, if ultimately upheld in a court case, would go far beyond print
digitization. It might impact the fact that search engines already do widespread copying of content to provide the core search services we take for granted.

Let me zero in on a key part of the complaint:

Google’s claim that it is fair use to make copies of every copyrighted work in even one major library, let alone three of them, is completely unprecedented in scale; it is
tantamount to saying that Google can make copies of every copyrighted work ever published, period.

It is not unprecedented at all. It is exactly what search engines have been doing over the past ten years, since they started crawling the web. They are making copies of
copyrighted works all the time, billions and billions of them.

When a search engine indexes a web page, it makes a copy of that page. Furthermore, all publications (at least in the US) are protected by copyright, regardless of whether
that copyright is formally registered. Registration just provides further legal protection and redress in case of infringement. The fact that a work isn’t formally registered
doesn’t mean it’s a free-for-all for anyone to use.

When search engines index content, they do not formally request permission to do such copying. They just do it. Don’t want to be copied? Then you have to stick up a
robots.txt file or use the meta robots tag to opt-out.

If you don’t opt-out, is that tantamount to granting permission? We don’t know. The Bidder’s Edge case didn’t really
answer it. Rather than copyright being the issue, it was found to be one of trespass.

The case involving image indexing between Les Kelly and Arribasoft cuts closer to this. When I spoke with Kelly
about his case years ago, he didn’t feel he should be required to opt-out, though he did try to. A
court later found that there were fair use elements involved with showing thumbnails of these images.

The association’s letter highlights this case in its argument against what Google is doing:

The single case you have cited to support Google’s fair use claim, Kelly v Arriba Soft, has a pattern of facts substantially different from those in Google Print for
Libraries. Among many other important differences, Arriba Soft was making copies of images that had already been digitized and posted on the web by their copyright owners.
Google is presuming the authority to digitize many works whose copyright owners have not taken that step, and given the ease with which digital files can be duplicated and
further transmitted, may have good reason for deciding not to do so.

Additionally, the full resolution copies Arriba Soft made in order to create the low-resolution thumbnails were deleted from Arriba Soft’s server after the thumbnails were
made. Google claims the right to retain the digital copies it makes — the full resolution copies, if you will — even in those cases when a publisher asks them not to display
any text from particular works.

It’s a bad argument. They are suggesting that the act of publishing on the web, which by its nature requires digitization, somehow may imply that copyright issues are
somehow less valid.

They aren’t. If it’s a copyright violation to copy a print book, in order to index it and show summaries of what’s contained, then it is going to be a copyright violation
to index a web page, index it and show copies of what’s shown.

In fact, Google, Yahoo and MSN go even further than this by providing cached copies of pages, another possible copyright violation
explored in this article from 2003. All do provide an opt-out of caching, of course. But again, it requires
the author to explicitly take away permission, rather than the search engine first asking for it.

When I’ve written on such issues in the past, my own view as been that ultimately, a court will likely rule the value of web search combined with opting-out does fall on
the fair use side. In other words, they aren’t going to require that permission be sought before indexing happens. You don’t want to be in? It’s easy to opt-out.

The Google Print project could change that, however. Should publishers win a ruling that opt-out is not allowed, online publishers might insist that they are entitled to
the same rights.

Postscript: Scholarly journals’ premier status diluted by Web from the Wall St. Journal looks at how scholarly journals are under threat by demands they should be open to everyone.

