What Search Engines See Isn't Always What You Get

You've just run a search, and there, perhaps in the top five results, you see one link that absolutely sparkles. Perfect! Just what you're looking for. You click on the link, eager to read your new treasure, and...

Bam! The page has absolutely nothing to do with your query! In fact, it's blatant spam, and not only that, code on the page has "mousejacked" your browser, disabling your navigation keys, effectively trapping you on the page. How could this happen?

Welcome to the world of "cloaking" -- a technique used by some webmasters to deliver one page to a search engine for indexing, while serving an entirely different page to everyone else. In short, cloaking is the classic "bait and switch" technique applied to the web.

Cloaking is a well-known technique to search engine optimization specialists, but is something few non-specialists seem aware of. That's too bad, because understanding cloaking can often help explain those seemingly inexplicable search results that might otherwise persuade you that a search engine has gone insane.

How does cloaking work? Ironically, by taking advantage of the protocol search engines use to be "polite" when requesting pages from web servers.

Search engines use programs called crawlers to find and fetch web pages for indexing. By convention, crawlers announce themselves to the web server, identifying themselves both by name and IP (Internet Protocol) address. This is so that if the server is busy, it can refuse a crawler's requests in favor of serving pages to its human visitors.

Since crawlers openly identify themselves when they request pages, it's a relatively easy task for a webmaster to write a script that looks out for either the name or IP address of crawlers. When a crawler requests a page, the server can literally return any page, and assert that it's the page the crawler asked for.

The technical names for this are "IP delivery" or "agent delivery." It's very easy for a webmaster to pull this switcheroo, and can be difficult for a crawler to detect when it's happened.

In defense of cloaking, some webmasters will point out that the technique is quite valid for protecting the meta data and other "optimization" techniques that cause a page to rank well for particular search terms. This is true. By serving an optimized page only to the search engine, and then serving a similar, unoptimized page to requests from web browsers, it's impossible for anyone to view the source code and steal the meta data or other elements that caused the cloaked page to rank well.

Cloaking also bypasses the problem faced by some sites that are regulated by law and are required to present a page full of legalese to first-time visitors. These pages may be necessary, but they typically don't have anything to do with the actual content of a web site.

By delivering a cloaked "content-rich" page to the search engine, the site still has a fighting chance of achieving high relevance in search results. Webmasters rationalize this approach by noting that users clicking through on the search result will still be presented with the necessary legalese before they can proceed to the content they thought they were going to get in the first place.

Nonetheless, the major search engines take a dim view of cloaking, for obvious reasons. If they catch a site using IP or agent delivery, most search engines will ban it permanently from the index. So it's a risky tactic for webmasters to use, even if they feel it's justified.

The next time you get a complete mismatch between what you see in search results and what you get when you actually click through to the page, there's a good chance you've encountered cloaking in action. If the mismatch is really obnoxious, consider reporting what you've found to the search engine, so they can investigate and take steps to decloak the offending site by kicking it out of the index.

Pagejacking Complaint Involves High-Profile Sites
How search engine optimization specialist Greg Boser used AltaVista's Babelfish translation service to pretend that he was AltaVista's search engine crawler and view cloaked pages that were allegedly stolen from one of his clients.

What Is A Bridge or Doorway Page?
Cloaking is often used for "bridge" or "doorway" pages. This article describes how webmasters use all three techniques in attempts to boost search engine rankings.

Page Cloaking
Explains issues and some basic technical details of enabling a page "cloaking" system, which often goes hand-in-hand with doorway page efforts. Only available to Search Engine Watch members -- More information on becoming a member is available at http://searchenginewatch.com/about/subscribe.html

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.

About the author

Chris Sherman is a frequent contributor to several information industry journals. He's written several books, including The McGraw-Hill CD ROM Handbook and The Invisible Web: Uncovering Information Sources Search Engines Can't See, co-authored with Gary Price. Chris has written about search and search engines since 1994, when he developed online searching tutorials for several clients. From 1998 to 2001, he was About.com's Web Search Guide.