Ending The Debate Over Cloaking

The Search Engine Report, Feb. 4, 2003

Few issues have divided the search engine marketing community than that of cloaking. There is a segment that firmly believes they should have the right to cloak their content from users, while another group strongly feels this is a deceptive tactic. Get the two talking about it and tempers flare.

Complicating the issue is the fact that while some search engines have guidelines against cloaking, arguments can also be made that these same search engines still allow it or even themselves practice it. In addition, just trying to agree on "what is cloaking" can lead to frustration on both sides.

Search engine marketer Alan Perkins hoped to clarify matters by publishing his Cloaking Is Always A Bad Idea article last month. Instead, the article has renewed the debate over cloaking, but perhaps in a helpful matter.

Below, we take a look at why people have traditionally cloaked, how XML feeds these days provide a form of approved cloaking and why the bigger issue to focus on isn't whether cloaking is allowed but instead whether paid content gets more liberal rules about acceptability.

Do Things For Humans, Not Search Engines

Perkins is a firm believer in doing things for humans, rather than search engines. His view is that people should make good content, then only tweak it to the degree that search engines have commonly encouraged, such as writing good title tags or making pages easily found via site maps.

"Suppose search engines did not exist. Would the technique still be used in the same way?" he asks, in his The Classification Of Search Engine Spam white paper. It's generally good advice that I agree with. I have always strongly recommended that people create excellent content, then do the small, simple things that can improve rankings. However, Perkins's guidelines can be too restrictive in requiring that people make search engine-specific changes only if they have "written permission" (assumedly through published guidelines).

For example, Google recently updated its webmaster guidelines and offers a lot of great, practical advice there. This doesn't mean that everything is covered, however. For instance, let's say you had a very long page about buying a used car. One section might deal with places that sell used cars, while the second covers negotiating a deal and a third section discusses how to inspect a car properly before buying. For search engine purposes, it would be wise to break that page up into three different pages, so that each of the new pages is more firmly focused on one of these subtopics.

Such an action wouldn't be done primarily for humans. Actually, humans might prefer one big page. However, it's such a subtle change that I doubt any search engine, including Google, would fault you for doing it. I think the "spirit of the basic principles" that Google describes in its guidelines is still being followed. You made good content not specifically for Google but instead for humans. While breaking up the page is something you specifically may have done to help Google index and rank the content, it remains primarily good content that you are offering to users.

Not Everyone Agrees With The Search Engines

The "Would I do this if search engines didn't exist?" question that Google suggests you ask yourself in its guidelines is excellent advice and clearly follows on what Perkins preaches. It's also the same advice that generally any search engine would tell you, when discussing getting listed outside their paid inclusion programs. But how does all this get back to the debate on cloaking?

Well, not everyone agrees with Perkins. Indeed, not everyone agrees with Google or the other search engines, when they issue guidelines. There's always someone who feels they have a situation that justifies doing something specifically for the search engines, rather than humans.

Sometimes, the justification is the technical limitations of search engines, such as:

  • You can't read my Flash content, so I'm building a page just for you to index.
  • I have a dynamic web site that you refuse to index, so I'm creating a static page that describes each of my products.

Other times, it's simply that the marketer believes its a jungle out there, so they'll do whatever they feel is best to compete. "I know people are building pages specifically to please your algorithm and getting away with it, so I'm going to do the same!" It's also common that they know they may get caught, perhaps banned, but decide that's a risk they'll take.

Doing The Doorway Page Dance

Those in the "do whatever it takes" camp tend to be practitioners of what are called "doorway pages." They also have been known by other names, and you can read my older article, What Are Doorway Pages, to learn some of their other names and more about them. However, the idea behind doorway pages is simple. You make a page targeting a particular search term, tweaking the title tag, the meta tags and the body copy in a way that you hope pleases the search engine's algorithm.

The result is often a very ugly page that you'd never want a human being to see. For instance, I recently needed to find some information about the movie "Thomas and The Magic Railroad" for my son, and I needed to go to someplace other than the official fan site. So at Google, I searched for "thomas and the magic railroad fan site." The last of the top pages listed was this:

Come here for thomas kinkaid
... kinkaid - the princess bride movie three six mafia mp3 the offspring music
three six mafia photos the oreilly factor show thomas magic railroad three dog night ...

See the description? The text is nonsensical. This page simply has a bunch of words on it. The person who created it simply hopes that some of the words will somehow form a match that pleases Google. It's also not a sophisticated doorway page attempt, but it worked for this extremely long query, that I did at Google.

Doorway pages were a popular tactic in the late 1990s, but they've declined for several reasons. Better use of link analysis is a key factor. Google, along with all the other major crawlers now, needn't depend just on a page's content to know what it is about. They can analyze links to understand both the content and popularity of pages. This means for popular queries, doorway pages have a much harder time succeeding than in the past, outside of paid inclusion programs.

In addition, the emergence of paid placement programs have also had an impact on running doorways. Lots of effort can be put into doorway pages, yet they offer no guarantee of ranking well. When paid placement came along, it provided the guarantee of top rankings. That comes at a price, but the price may be well worth paying, when measured up against time spent on doorways and uncertainties.

Finally, paid inclusion has provided a solution to some of the reasons that doorway pages were initially deployed. For example, those with dynamic content that might ordinarily be missed by some crawlers can now use paid inclusion programs as a way to get indexed without going the doorway route.

Bring On The Cloaking

While doorway pages as traditionally done are in decline, nevertheless, they still exist. The chief problem with them also remains: they aren't content that you want users to see. In the example above, imagine your reaction in coming to the page with a ton of nonsensical content. You'd go away. This is why doorway use is often accompanied by cloaking.

When cloaking, you show the search engine something different than what you show a user. There are many ways to cloak, but those who are serious about it typically do what's called IP cloaking. This means that you know all the internet addresses that the major search engines spiders use when they access the web, their "internet protocol" addresses. That's the IP in IP cloaking. If you see a request come from one of these known addresses, then you deliver your custom content. Meanwhile, a human user sees something different.

The page in the example above used cloaking. When I examined it, the content wasn't nonsensical. Instead, I got a simple page, easily readable, with two links that lead me to get product information about Christian artist Thomas Kinkade at other web sites. The person behind this page no doubt earns affiliate fees from clicks off this page. No doubt, the page will also be removed by Google soon. Google has a specific ban against cloaking and may take action against pages doing this.

Cloaking Does Not Equal Spam

Hopefully, I've by now explained two completely different tactics in search engine marketing: doorway pages and cloaking. The two are not the same, though they often go hand-in-hand. Doorway pages are the effort to "crack" or "please" a search engine's algorithm. In contrast, cloaking does nothing to please an algorithm. It's merely a way of delivering targeted content.

This is an important distinction to make, because some people like Perkins want to declare that cloaking is automatically equal to spamming search engines. To me, that's not necessarily the case.

Spam is often cloaked, absolutely. Google certainly considers cloaking to be spam. Both Inktomi and Teoma have guidelines against it, as well. However, as well see, I'd argue that they allow cloaking via their paid inclusion programs. Meanwhile, FAST and AltaVista actually have no written guidelines that I can find against cloaking.

Finally, and this is the most important point, by declaring cloaking to automatically be spam, Perkins leaves himself open to the pro-cloaking arguments he most wants to stop, and stop with good reason.

Everyone Cloaks!

Perkins wants to define cloaking in a technical manner: "If you need to know a search engine's IP address or some details from its HTTP request (e.g., its user agent name) in order to deliver content, you are probably cloaking. If you don't need that information, then you are certainly not cloaking."

The problem is, not everyone agrees with Perkins' definition. For example, those defending cloaking like to talk about the fact that in some countries, if Google detects you are outside the US when trying to reach Google.com, it will redirect you to your "local" site.

Personally, I wouldn't define that as cloaking, since I've never seen the case where Google has shown someone something different on the Google.com home page, depending on their country -- and I've gone to the Google.com from a variety of different countries. If something happens, it usually is that you try to reach Google.com and instead get redirected to a completely different, non-Google.com web site.

A better example for those who want to say that Google cloaks might be when you do a search there. If I search at Google.com from where I live in the UK, I get ads targeted to those in the UK. That's a completely different experience than what a user in the US would get, yet we'd view the same URL.

An even better "everyone cloaks" argument is that cloaking is even built into some web server software. For instance, let's say you build a web site with three different versions of pages, a text-only version, a version for those using Internet Explorer and one for those using Netscape's browser.

Your web server allows you to target all those people who have IE or Netscape and show them custom versions. Everyone else gets the text-only version. So, a search engine spider coming to the site sees the text-only content, which is different than what the vast majority of your users see. You haven't even actively set up your web server to do this, but it happens -- and could be considered cloaking.

Coincidentally, the same time Perkins posted his article, WebmasterWorld.com owner Brett Tabke posted his own thoughts on mainstream cloaking. That forum thread starts out with more examples of cloaking being argued as commonplace.

"Everyone cloaks" arguments can infuriate the anti-cloaking crowd. They find the arguments merely an attempt to confuse people about what "real" cloaking is, and I've literally seen people turn red trying to push back acceptance of these other examples as cloaking.

There's definitely truth in what the anti-cloakers say. Some search engine marketers definitely employ the "everyone cloaks" defense as a means to get clients to sign-on to potentially risky campaigns, which is the most worrying issue to me. But others have real differences of opinions as to what cloaking is. In my view, coming up with a definition that accommodates them, as well as the anti-cloakers, is the only way forward on this issue.

Cloaking Doesn't Kill Search Engines; Spam Kills Search Engines

My solution, I hope, is simple. I suggest that we define cloaking not by technical terms but instead by the end result:

"Cloaking is getting a search engine to record content for a URL that is different than what a searcher will ultimately see, often intentionally."

Unlike Perkins, I don't care how the cloaking is done technically. Whether it is by user agent detection, IP detection, "poor man's cloaking" by placing content within a noframes area, hiding content with layers using cascading style sheets or whatever. If the typical searcher sees something different than the content of the page recorded in the search engine's index, that's cloaking. This also fits in with the guidelines we do have from three of the crawler-based search engines that offer them:

  • GOOGLE: The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site. In other words, the webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings.
  • INKTOMI: Pages that give the search engine a different page than the public sees (cloaking).
  • TEOMA: Web pages that show different content than the spidered pages.

Of these, only Google suggest a technical definition to cloaking with its statement about a webserver being "programmed" to deliver custom content. I'm being broader than this, but I also think that fits in well with Google's other guidelines in general that warn against hiding information from users.

Indeed, "cloaking is hiding," summarizes Jill Whalen, the search engine marketer who originally published Perkins's article in her popular High Rankings Advisor newsletter, then who diligently followed the debate that broke out at the ihelpyou forums (Why Cloaking Is Always A Bad Idea) and WebmasterWorld.com (Cloaking Gone Mainstream). Both threads provide excellent views on this subject.

Yes, exactly that. Cloaking is hiding. Even hiding text by making it the same color as the background of a web page ("invisible text") is a form of cloaking. Low tech, but cloaking all the same.

Another crucial difference between my definition and that of Perkins is that I do not automatically declare cloaking to be spam. This is an important distinction, if the goal is to help educate people about the potential problems associated with cloaking.

It's also important, because even though Perkins says in his article that all the search engines say "don't cloak," as I've written, AltaVista and FAST don't actually say this at all, in their webmaster guidelines. In addition, both of them as well as Inktomi and Teoma arguably allow cloaking via XML feeds, as I'll conclude.

Even Google, despite its ban, might be considered to allow cloaking when some of the "everyone cloaks" examples are employed. Of course, anyone who thinks such arguments will protect them from Google, if caught cloaking, is more than likely to lose the battle. But, we'll come back to this.

To Win For Free, Focus On Content

It bears repeating. Cloaking often goes hand-in-hand with low-quality doorway pages, which search engines often consider spam. If you are considering cloaking, it is probably because you are creating content that you hope will please a search engine's algorithm, rather than content that should exist primarily to please human visitors. Such efforts can often be time-consuming, not yield the expected results and may only work for a short time.

I've sometimes used a bicycle metaphor to explain this. Those who create doorway pages are like someone who jumps on a bike, sprints forward and leaves you, the quality content builder, behind. Eventually the sprinter tires. You overtake them, without having to do any additional effort. In addition, sometimes the "sprinter" never even overtakes you in the first place.

OK, so it's also the tortoise-and-hare story told again. However, it remains true. All the search engines reward good content. This is especially because good content attracts those crucial links that everyone wants. Focus on good content, and when it comes to getting listed "for free" in the editorial results of the major crawlers, you are playing the smart, long-term game.

Approved Cloaking & XML Feeds

Things are different when it comes to paid inclusion, which all the major crawlers but Google offer. In particular, all the paid inclusion crawlers have ways for content providers to "feed" them information via XML.

To understand this process, picture a spreadsheet that has all the URLs you want listed, row by row. Information about each URL is listed in the columns -- the title of each URL in the first column, the description of each URL in the next column, the "body copy" of each URL in next column and so on. It's not really web pages that are read but tabular information about URLs that is pumped into the search engines.

To me, XML feeds are a form of approved cloaking. That's not why most people use them, nor should that be the main reason you consider them. XML feeds really were not initially intended to be a new way for marketers to cloak low-content doorway pages but rather a simple way of feeding in dynamic content such as a product database. If you are an online merchant, XML feeds make a lot of sense to consider.

Having said all those disclaimers, there's no doubt that some people are indeed using XML feeds as a way to cloak doorway pages. Moreover, they have the approval of the search engines, since these feeds are reviewed by the search engines for quality. In addition, there's evidence that being in these programs may help such content compete better for rankings than if it were picked up for "free."

In November, I looked at this situation with AltaVista (and if you are really interested, Search Engine Watch members got a much more detailed look). This month, for Search Engine Watch members, I look more closely at the situation with Inktomi. As with AltaVista, there's evidence that XML feeds have been an effective way for some companies to feed and cloak content that might not otherwise have met Inktomi's content guidelines.

Inktomi admits that its XML feeds do technically violate its posted guidelines about cloaking and says its now looking to amend these. However, that's not really the key concern. Instead, the real issue is that XML feeds and perhaps paid inclusion in general is allowing some people to provide content in a radically different way than has been generally accepted when content is gathered for free.

In particular, promotion that people have done in the past via traditional doorway pages and cloaking -- and have been banned for -- now can now be done under the guise of content feeding, with the search engines that offer this. That's why I feel it's almost naive to be arguing about whether cloaking is an acceptable delivery mechanism these days, except in the case of Google.

For the others, the important issue revolves around content standards. If low-content doorway pages are not acceptable editorial content when found by a search engine spider naturally, should they suddenly be OK when read via paid inclusion programs? If there's a debate to be having, this is it.

Avoiding Trouble With Cloaking

As said, to me XML feeds are a form of approved cloaking. I suspect some search engines also may allow some ordinary HTML pages to be cloaked via their non-XML paid inclusion options, as well -- something I hope to clarify in the future.

Also as said, some may argue that my broad definition of cloaking means that Google might knowingly allow it, in some cases. For example, it's possible that the site throwing out text-only pages might get banned by Google for "accidentally" cloaking, then upon review might have that penalty lifted.

Ah, ha! Proof that Google has allowed cloaking. If so, so what? Google clearly reserves the right to do whatever it wants when it comes to cloaking, when it warns that those who cloaked "may" get permanently banned, rather than say "will" get banned.

Maybe Google has let a site "technically" cloak or perhaps even overtly cloak, for some reason. Banking on that individual decision to defend yourself if you actively cloak against Google is just foolish. Instead, I would say most people who choose to show Google cloaked content do so knowing that they may get caught and tossed out.

Overall, I'll leave you with my definition of cloaking, backed up by some additional guidelines that I think will steer you away from trouble:

"Cloaking is getting a search engine to record content for a URL that is different than what a searcher will ultimately see, often intentionally. It can be done in many technical ways. Several search engines have explicit bans against unapproved cloaking, of which Google is the most notable one. Some people cloak without approval and never have problems. Some even may cloak accidentally. However, if you cloak intentionally without approval -- and if you deliver content to a search engine that is substantially different from what a search engine records -- then you stand a much larger chance of being penalized by search engines with penalties against unapproved cloaking. If in doubt, ask the search engine if it has a problem with what you intend to do, assuming you can't get a clear answer from written guidelines that are provided. If you are working with a third-party search engine marketer, ask them for proof that what they intend to do is approved. Otherwise, be prepared for any adverse consequences."

I'd like to say all the search engines will promptly respond if asked, but they probably won't, except to those in paid inclusion programs. Still, if you've asked and ended up in trouble, then you can at least show you tried to get clarification. If you aren't an "industrial strength" cloaker, that may help.

As for working with third-party firms, understand what they are doing for you. Ask to know if there are any potential risks and get this spelled out in advance. If you aren't comfortable, walk away.

Someone who's going to engage in unapproved cloaking and who is professional will tell you the risks and not try to make you think that cloaking content is something "everyone does." Instead, they'll explain why they do it, why they think it works and what the possible downsides will be. They'll do this because they often work with clients prepared to take those risks, so they aren't interested in trying to disguise what they are doing.

2003: The Year Of Paid Inclusion

Let me conclude by going back to what I said was the real issue in this debate, that content standards seem to have changed, as most crawlers have become dependent on paid inclusion as a revenue generator.

The standards are for the search engines to change, of course. Nor does having different standards -- perhaps more liberal standards -- for paid content necessarily mean that users or relevancy is harmed. However, it does create confusion and concerns.

For paid inclusion to succeed, we're going to need the providers to be much clearer about exactly what benefits and advantages are provided over unpaid content. That's going to help search engine marketers trying to make purchasing decisions, as well as users evaluating the results they receive.

I also expect that paid inclusion content will ultimately need to be segregated from unpaid content, the more that content guidelines diverge. As I wrote in my previous article about issues with paid inclusion at AltaVista, such segregation may have positive benefits for both search engine marketers and users.

If the search engines fail to do this voluntarily, I think it's likely we'll see a third party such as the US Federal Trade Commission suggest it happen. In 2002, the FTC told those carrying paid placement listings to clean up their acts. In 2003, the agency's aim may shift to issuing new, stricter guidelines about paid inclusion listings.