Show Me the Content: Web Search, Verticals, and Metasearch

Putting the Screws to Google, by Jon Fine from BusinessWeek offers a look at how, "old media could take back its share of search's ad bounty." So, in a sense it's not only putting it to Google but to Yahoo, Ask and other general purpose web engines. Of course, the word Google in a headline gets people to look.

It's an interesting read. How would these "old media" players do it? Fine offers an example of Walt Disney, News Corp., NBC Universal, and The New York Times, joining together to form a "Content Consortium" that offers a search engine containing content that, "no outside search engines can access."

Of course, Google is well aware of proprietary content issues that Fine raises. If you look at the "Risks Related to Our Business and Industry" section of many of Google's SEC filings (including their IPO filing) you'll read:

Proprietary document formats may limit the effectiveness of our search technology by preventing our technology from accessing the content of documents in such formats which could limit the effectiveness of our products and services. A large amount of information on the Internet is provided in proprietary document formats such as Microsoft Word. The providers of the software application used to create these documents could engineer the document format to prevent or interfere with our ability to access the document contents with our search technology. This would mean that the document contents would not be included in our search results even if the contents were directly relevant to a search. These types of activities could assist our competitors or diminish the value of our search results. The software providers may also seek to require us to pay them royalties in exchange for giving us the ability to search documents in their format. If the software provider also competes with us in the search business, they may give their search technology a preferential ability to search documents in their proprietary format. Any of these results could harm our brand and our operating results.

From the BusinessWeek article:

"For the life of me, I can't imagine why they haven't done it," says Tom Curley, CEO of Associated Press. Here's one reason: Doing it would require spinal implants for intimidated media barons. But the notion that some pushback is pending is not far-fetched. Curley says he is talking with potential partners about setting up subject-specific Web packages -- say, for travel or basketball -- that will include content from multiple media. Once partners are on board and packages are finalized, search engines will be invited to bid for that traffic.

So the AP might be getting into the vertical search business, interesting.

For a long time I've said verticals will continue to grow in popularity and importance as meta search tools which are getting better all of the time will allow various database and content publishers to offer material (free or fee) to end users who will select these databases at the time of their search based on their information need. Of course, database selection tools to assist users in making these decisions that incorporate personalization, social networks, etc. will also be available.

The metasearch tool could be sponsored and/or have contextually based advertising included as a part of it.

Fee-based content could be made available for free if, for example, the user would view a certain number of ads over a given period of time. Marketers could also sponsor access to databases with fee-based content. For example, Kayak or Expedia might sponsor access to a database containing digitized travel books and videos.

Smaller but focused databases, can potentially offer more precise results (higher precision, lower recall). Don't forget that for many web searchers, the Invisible or Deep Web is everything beyond the first six or seven results. Advanced searchers might also benefit with a unified interface versus numerous interfaces and syntaxes. Training sure would be easier.

In many respects, what I'm talking (in concept not content) has been around for years with services like Dialog and LexisNexis. For example, Dialog offers access to over 1000 databases with many coming from various database producers. I often describe it as a supermarket of databases with a common syntax. Users select various databases depending on their information need.

Another example. I've written numerous times about the many full-text databases (available for free, without going to the library, for personal use). Well, the San Francisco Public Library offers searchable access to many of these databases using a single interface. They call it a cross-database search. Instead of having to go to 20 databases and then search each one, you can pick and choose databases depending on what you're looking for. Articles? Reference answers? Images? Directory info? Business? Local?

The SF Public Library is hardly the only organization offering this type of service. The topic of cross-database (aka federated or metasearching) is a hot topic these days. In fact, NISO, the National Information Standards Organization, has a large initiative in developing metaseach standards.

Postscript: Cold North Wind is another company involved in large newspaper digitization projects. Their PaperofRecord.com site is their public database where you can actually see what they have digitized to this point.