Call it the "invisible web," "shallow web" or "deep web," the various names refer to the same thing -- content that crawler-based search engines ordinarily cannot access. There are several roadblocks that can stop spiders in their tracks, and one of the worst is making content password-protected. Place your pages in an area that only registered users can access, and search engines will never find them -- nor will those searching at search engines.
It's a shame, because there's plenty of good content that may be offered in registered areas that the general public might like to discover. Enter eLuminator, a product from MediaDNA. eLuminator will duplicate protected content in a way that makes it accessible to search engines without dropping any restrictions required for human visitors before they are allowed to view it.
The system makes use of doorway pages and cloaking -- highly-charged words that raise the hackles of many and certainly terms MediaDNA will probably dislike being applied to their service. However, the automated system uses them in a way that makes it stand out as a shining example of why such techniques cannot be automatically be considered search engine spam. Anything but, eLuminator really does help both search engines and their users, and the system is so positive that MediaDNA was selected by Inktomi in September as a partner for its new paid inclusion system.
Let's say you have a stock tips web site with 1,000 pages hidden behind a registration system. eLuminator will read those pages and automatically make a search engine friendly version of each one of them. This is the doorway page concept -- you create a page designed to please the search engines, rather than humans. However, that's as far as that comparison can be made. Doorways are often highly manufactured pages, designed to rank well for a particular term and maybe even a particular search engine. Instead, eLuminator works with existing pages to distill the essence of what they are about, then makes a version that the search engines can read.
For example, eLuminator might read your stock analysis of IBM. It might then extract the headline to be the page title, the opening paragraph for the page's meta description tag and make a list of the unique words that appear on the page for the meta keywords tag. The opening paragraph and the keywords would also be used to appear on the page itself. The result is an abstract of what the original page was about, but it's not "human readable," as MediaDNA characterizes it. In other words, there's no sentence structure on the page, no context that makes the page valuable to humans, though spiders have enough information to understand what it's about.
The pages are then placed on a "ghost site," which usually has a domain similar to the site owners, for branding purposes. For example, if our original site was called "greatstocktips.com," then perhaps our eLuminator pages would go onto "greatstocktips-eLum.com." The pages would then be submitted to the major crawler-based search engines.
Should one of the pages appear in response to a search, cloaking is used so that the human visitor sees a page designed for them. They'd see a summary of what the page is about and how to access it -- whether that be purchasing the article or simply registering with the site.
eLuminator is working no special magic on the pages to make them rank well. These are not intensively optimized pages, and they might even perform better if the sentence structure was left intact. Nevertheless, the fact that sites might go from having little or no representation in search engines to perhaps thousands or hundreds of thousands of pages listed means that they are more likely to naturally have some pages rank well and see a traffic increase -- perhaps a substantial one. MediaDNA cited one client's experience where 4,500 documents were listed and in turn generated 365,000 clickthroughs from just Inktomi-powered services during a month.
"Because eLuminator allows full-text searching of valuable content, we're finding that the traffic we're driving to clients' sites is better targeted than traditional doorway page traffic" said Larry Vernec, vice president of marketing at MediaDNA.
To date, Inktomi is eLuminator's only official search engine partner. That means you can guarantee that your pages will be listed in the Inktomi index. eLuminator also submits your pages to other crawler-based search engines, but the lack of partnerships means there's no guarantee they'll be included. Nevertheless, you should expect that some will get picked up. MediaDNA is working to establish partnerships with other crawlers.
MediaDNA charges a US $5,000 set-up fee for the eLuminator service, and then youll pay anywhere between 5 cents to 40 cents for each person that clicks through to your pages. You purchase blocks of clickthroughs in advance and are given a lower per click rate if you buy a large block at a time. To date, eLuminator has about a dozen clients, including ZDNet, Hoover's Media Technologies, Penton Media, McGraw Hill, and channel partner Qpass.
Who should use this? If you have the budget, obviously anyone with password-protected content that they would like made accessible to the search engines. eLuminator is also useful to anyone with a site that poses problems to spiders, such as those that use dynamic URLs or frames. It's also useful for anyone with content in non-HTML or text format, such as PDF files. However, having good, text-based content is essential. eLuminator will only be successful if it has such content to work with.
Invisible Web Gets Deeper
The Search Engine Report, Aug. 2, 2000
Invisible Web? Deep Web? Shallow Web? This article explains the concepts in more depth.
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!