Must We Unlock the Deep Web?

Search engines crawl, rank, and return knowledge in the form of scraps of information millions of times every day. Search engines are good at what they do, but they aren’t especially great at it.

Local information, research documents, mountains of medical data, and otherwise useful information lies buried beyond the reach of GoogleBot and other crawlers. The world anxiously awaits a way to access this information.

Or does it?

Much in the same way I’m tired of empty political promises and outright lies, I grow weary of false search engine promises. The deep Web? Is any of this information useful? Did the Cuil launch teach us anything?


The “deep Web,” or “invisible Web,” allegedly contains countless untold gems of information. Searching the secret Web, if you please, is more than just a way for crafty young college students to plagiarize research papers.

I recently had the opportunity to try out, a new deep Web search engine. After uploading my personal information and confirming my e-mail address, I was extended the privilege of free searching. DeepDyve allows and encourages enormous keyword strings and complex queries. Given the amount of data that DeepDyve is collecting, I was sure my searches would be successful.

I conducted three searches off the top of my head. First, I looked for information on how patients with familial hemochromatosis might be able to donate blood. Since the promo pages on DeepDyve had cited oodles of health information, I thought I’d give it a whirl.

“No Results”

After several unsuccessful searches for medical information and scientific data comprised of very detailed sentences, I gave up and tried a suggested example search for “measles outbreak” that automatically populated the massive search box with the following:

“The outbreak, mostly in schoolchildren, made it clear that the authorities had been wrong in assuming that more than 90 percent of children had had measles shots, the report said. Gibraltar is a British territory, and resistance to the measles-mumps-rubella vaccine has been high in Britain since a 1998 report in The Lancet speculated that it could cause autism. That report has been widely discredited, and numerous later studies showed no link between vaccines and autism. Nonetheless, as a consequence of dropping vaccination rates, Britain has had several local measles outbreaks. A measles outbreak in Gibraltar has infected almost 1 percent of the territory’s 28,000 people in just three months, according to a report by its public health director.”

Naturally, after staring at the “loading” box for about 30 seconds (or one lifetime in Internet search time), once again my dialogue box appeared with “no results”

Mortgaging Deep Web

We’re not really sure just how much information exists in the deep Web. An article reading like unchecked advertorial appeared in Wired that profiled DeepDyve’s launch, citing an 8-year-old study from UC Berkeley as the need for such a search tool.

In a nutshell, the old study from Berkeley placed the deep Web at more than 10 times the size of the visible Web. Time has passed and, though much has changed, there are still vast amounts of information remaining undiscovered.

Estimates of how much useful information that’s out there can be equated to stacking a series of guesses to form one highly questionable conclusion. As I recall, the guessing game got us all into a little bit of financial trouble recently, but we shouldn’t let that prevent us repeating the cycle of disaster.

Value Proposition

Information that can’t be crawled, is somehow restricted or isn’t amenable to link-based ranking methodologies, may not be easily found, but there are (often very good) reasons for keeping such information private. One big reason is that (heaven forbid) a company may want to charge a fee to access said content.

The other possibly larger issue is the question of the value of such information. I understand the value proposition, or the idea around the value proposition. Millions of pages, vastly underutilized, can be yours for a nominal fee. And that, my friends, is where I take exception to the Google-killer selling proposition.

Selling a subscription-based search engine while claiming to be more efficient than Google has been a recipe for disaster for more than one wannabe search site. Again, why learn from other’s mistakes?

At the end of the day, DeepDyve and others entering the space are more akin to subscription-based content players. Comparing them to a traditional search engine is not only an enormous mistake for valuing the company; it places an unreal expectation on the usefulness of such a tool. I’ll try my searches again, when more content providers have “opted in” to DeepDyve.

Join us for Search Engine Strategies Chicago December 8-12 at the Chicago Hilton. The only major search marketing conference and expo in the Midwest will be packed with 60-plus sessions, multiple keynotes and Orion Strategy sessions, exhibitors, networking events, and more.

Related reading

interview with SEMrush CEO
facebook is a local search engine. Are you treating it like one?
17 best extensions and plugins that experienced SEOs use
Gillette video search trends