NOTE: SEE BELOW FOR RELATED ARTICLES ON THE SUBJECT OF CLOAKING FROM JULY 2000 THROUGH AUGUST 2004. FOR CLOAKING ARTICLES FROM SEPTEMBER 2004 ONWARD, SEE THE SEO: CLOAKING CATEGORY OF SEARCH TOPICS IN SEARCH ENGINE WATCH.
Many advanced search engine marketers often ask if there is a way to hide or “cloak” their HTML code, so that others cannot try to duplicate any success they may have with search engines. They may also want to do this in order to hide an ugly doorway page that’s been optimized for search engine spiders from being seen by human visitors.
This page summarizes the two major ways cloaking is typically implemented: by delivering pages via agent name or via IP address. A variety of resources and articles on the subject of cloaking is also covered at the end of this article.
Cloaking pages may initially sound like a great idea, but it can be fraught with a number of dangers.
Page cloaking can be a waste of time, for some people. You’ll need to implement the delivery system, to start. You’ll then need to stay updated with the latest agent names or IP addresses. Then, once this new system is running, you may find yourself investing more time into the creation of doorway pages or tweaking code, rather than allocating time into other forms of online publicity.
Spamming is a bigger worry. The desire to cloak HTML code often goes hand-in-hand with aggressive search engine optimization efforts. Any time you are aggressive with the search engines, you take a risk that your actions will seem like spamming.
To avoid problems, please be sure to read the More About Doorway Pages article. It has common sense guidelines designed to help you avoid trouble with search engines. It’s also highly recommended to also read the Ending The Debate Over Cloaking article from February 2003, for another look at how search engines continue to grapple with the issues of doorway pages and cloaking.
Agent name delivery can be relatively easy to do. Many web servers can be configured to deliver different pages depending on the type of browser a visitor is using. A search engine spider is simply another browser. Thus, as long as you know the agent names of each spider, you can configure your server to deliver pages accordingly.
Where do you get agent names? The SpiderSpotting page is an older article from Search Engine Watch that explains how you can develop your own list from scratch. This information is primarily intended to help you know if you’ve been visited by a spider, but it can also be employed for agent name delivery purposes.
Browser Detection is a January 6, 1999 article from WebMonkey that also provides a good introduction to detecting agent names, along with links to server-side solutions at the end. Finally, the Software Solutions section below provides some other resources.
The drawback to agent name delivery for many people is the fact that it is possible for someone to trick your server into believing they are a search engine spider. This means that your code is not 100 percent safe from prying eyes. However, agent name delivery is a simple and easy way for many people to get closer to the goal of code security.
IP delivery is much harder than agent delivery. It will likely involve some custom programming to rig your computer to deliver pages based on an IP address. Moreover, IP addresses can often change. You can expect to be monitoring search engines on a regular basis to find any changes, if you decide to use IP delivery.
Search Engine Watch doesn’t track IP addresses. However, if you know the basics of SpiderSpotting, you can create your own list from scratch. This is especially easy if you disable DNS resolution on your server.
The big advantage to IP addressing is that it provides 100 percent code security. Someone cannot fake an IP address, in the way they can imitate a search engine spider’s agent name.
If you decide you want an agent name or IP delivery solution for your own site, there are some software packages that may help. I’ve not tested any of these, but you may find them worth exploring. They are listed generally in order of when I’ve first heard of them.
Open Directory’s Cloaking Category
That’s right — cloaking is so mainstream that there’s even a category for it. Check out packages here.
SpiderHunter.com Comparison Chart
Compares a variety of different cloaking packages.
IP Delivery Food Script
One of the first commercial cloaking packages ever made available.
Another long-time company offering cloaking solutions.
Make It Online
Offers a variety of cloaking packages.
Position-It.com’s Search Engine Cloak
IP delivery system available for multiple platforms.
WebmasterWorld’s Cloaking Forum
Discussion about various cloaking-related topics is on-going, here.
Some companies also provide cloaking as part of the search engine marketing services they offer. See the Outsourcing Search Engine Marketing page for help in locating companies in general.
Wondering if someone is cloaking? Your first clue will be if the page title and description listed in search engine results are dramatically different from what appears in the page’s HTML code. This isn’t foolproof, however. Some crawlers may use title and description information from human-created directory listings, for example.
WebBug is a tool that allows you to see the HTTP header information sent out from a web server plus lets you control the user agent name you send. That means you could pretend to be a Inktomi or Google spider, for example. If anyone is feeding information based only on agent names, you’d then see exactly what they were sending. However, if they are using IP-based cloaking, this program won’t help you see the “real” page sent to a particular spider.
PageSneaker is an online resource that allows you to pull up a page and see its contents divided into different elements, such as body copy and header information. That’s helpful for pages that do fast refreshes, a sort of “poor-man’s cloaking.” You can also do a keyword density analysis, in different areas. If checking on the home page of a site, be sure to use a slash at the end of the domain, or the program doesn’t work properly. In other words, you’d input http://www.site.com/ rather than http://www.site.com.
How Do I Spot Cloaked Sites?
Search Engine Watch Forums, Aug. 31, 2004
If someone is cloaking content, how do you know? This thread explores the topic.
Cloaking By NPR OK At Google
SearchDay, May 28, 2004
A technique used by National Public Radio to get its audio content indexed by Google seems acceptable to the search engine despite apparently violating its own guidelines about cloaking.
Google Confirms Automated Page Removal Bug
SearchDay, May 14, 2004
Microsoft, Adobe and some other web sites had pages removed from Google without their consent, due to a bug with Google’s page removal tool. And WhenU gets pulled for cloaking.
Can you ethically cloak your Web content?
ElectricNews.net, Feb. 12, 2003
The ethics of cloaking get debated during a meeting of the Irish Internet Association.
Ending The Debate Over Cloaking
The Search Engine Report, Feb. 4, 2003
A look at why people have traditionally cloaked, how XML feeds these days provide a form of approved cloaking and why the bigger issue to focus on isn’t whether cloaking is allowed but instead whether paid content gets more liberal rules about acceptability.
‘Real’ Plea: Make Love, Not Porn
Wired, Oct. 5, 2001
The porn industry is widely known to produce some of the worst search engine spamming. Ironically, there’s now an anti-porn group that’s also turning to search engine spam to get out their message. They have apparently generated thousands of cloaked pages that promise porn content to viewers but instead deliver an anti-porn message.
Search Engine Cloaking: The Controversy Continues
SearchDay, July 18, 2001
SearchDay readers sound off on the ethics, practicality and effectiveness of search engine cloaking. Also links to original article with feedback defending cloaking.
To Cloak or Not to Cloak
ClickZ, July 21, 2000
A look at the pros and cons of page cloaking, with the ultimate conclusion that you shouldn’t do it, because the search engines say they disapprove.