HarperCollins To Digitize Own Books; Unclear How That Will Feed Into Book Search Engines

Date published 15 December 2005 Author

Danny Sullivan

Categories

Industry

HarperCollins To Digitize And Control Its Book Content
from the Wall Street Journal looks at HarperCollins saying it will digitize its
active backlist of 20,000 titles and up 3,500 books per year. Part of the idea
is that by doing this itself, the publisher can give content to the search
engines to index but keep the files themselves. That leads me to think
HarperCollins doesn’t understand how book indexing works. From the story:

Search companies such as Google will then be allowed to create an index of
each book’s content so that when consumers do a search, they’ll be pointed to
a page view. However, that view will be hosted by a server in the
HarperCollins digital warehouse. “The difference is that the digital files
will be on our servers,” said Brian Murray, group president of HarperCollins
Publishers. “The search companies will be allowed to come, crawl our Web site,
and create an index that they can take away, but not the image of the page.”

This would prevent such Internet companies from selling a digital copy of
that book unless HarperCollins decided to partner with them as a retailer.
“We’ll own the file, and we’ll control the terms of any sale,” he added.

OK, in order to make a searchable index of a book, a search engine is
essentially making a copy of the book, though it doesn’t mean that it reprints
that copy.
Indexing Versus Caching & How Google Print Doesn’t Reprint from me earlier
explains this in more depth.

So yep, the search engines won’t have images of a book to display — assuming
they go along with this — but they will have a copy of all the words in the
books. And that’s pretty much all Google doing with the Google Library scanning
project — making an index of books, a card catalog, exactly as HarperCollins
wants to replicate.

Interestingly, HarperCollins — though not a party to that suit over Google
Library — says it

supports it “economically and philosophically.” Well philosophically, it
doesn’t seem to understand it’s doing pretty much what Google’s doing already.

Here’s the especially tricky bit. Google and gang, if they are “allowed to
come, crawl our web site,” as HarperCollins puts it, are then going to have
access to the same content the general public gets. In other words, whatever you
put out for crawlers, anyone gets. So is HarperCollins going to put the full
text of books online? Because then forget the part about selling digital copies
(not that Google and gang are doing that now). The digital copies will be out
for anyone to access.

Alternatively, the various search engines do have programs where site owners
can submit content, such as Google’s
here.
But you can’t just send them some non-descript “index.” They want PDF, though
the program doesn’t require that actual pages have to be shown, despite coming
in as PDFs.

Aside from book search, there are programs such as
Google
Scholar or
Yahoo Search Subscriptions that can effectively left content owners cloak
material — the general public sees abstracts while the search engine indexes
the good stuff. But neither of these, to my knowledge, will work for book
search.

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Opinion

Information

Follow us

HarperCollins To Digitize Own Books; Unclear How That Will Feed Into Book Search Engines

Leave a Reply Cancel reply

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Solving the agency search intelligence gap

What to expect from SEO in 2021?

Search engine saturation: The ever evolving SERP and how brands are respond...

What's it like using DuckDuckGo in 2019?

Dragonfly: 500+ staff sign open letter for Google to drop new Chinese searc...

The evolution of search: succeeding in today's digital ecosystem - part 1

Search trends 2018: what can marketers learn?

SEW Interview: Clark Boyd on visual search