Search Engines Handle No Index Inconsistently

Matt Cutts has a nice illustrated survey of how various major search engines
deal with the meta noindex tag in
Handling
noindex meta tags
. He finds inconsistency, with this being the summary:

  • Google doesn’t show the page in any way
  • Ask doesn’t show the page in any way
  • MSN shows a url reference and Cached link, but no snippet. Clicking the
    cached link doesn’t return anything.
  • Yahoo! shows a url reference and Cached link, but no snippet. Clicking
    on the cached link returns the cached page.

Interestingly, if you use a robots.txt file to ban indexing, in that case
Google DOES show the page in some ways. Matt acknowledges this, but it still
raises the question why Google operates differently when the intent of both
mechanisms (explained here)
is the same. I’ve commented in his blog on the issue as follows:

Why would Google want to treat meta noindex and robots.txt differently.
They are both intended to do the same thing — keep pages out of an index. The
only reason we have two options is simply because some people can’t setup
robots.txt files for their sites, which might be within the domains of others.
However technically they are implemented, it seems like they should be treated
the same way.

My gut tells me most webmasters would prefer that all the search engines
not list any pages that use either a robots.txt or meta noindex command.

From a user perspective, I think the technique of showing a link to a site
if you can learn about it another way is fine, such as being listed in the
Open Directory or from links on the public web to those sites.

The Yahoo implementation of meta noindex is odd — why show a cached page.
But I can see a hole here. They might not be actually indexing the page but
still caching is since the specific

noarchive tag
isn’t also being used:

Sounds like summit time! Not only would a standard on how meta robots and
robots.txt be handy, but it would also be nice to know if blocking a page also
inherently blocks caching.

A summit — or consistent standards, is something the first person

commenting
on Matt’s blog is calling for. If it happens, perhaps it could
also be extended to feeds.
Ask.com &
Bloglines Proposes Blog Search Exclusion Tag
from us earlier this month
covers a proposed
standard
from Ask.com.
The robots are coming!
The robots are coming!
over at SEOmoz gives some brief examples of why this
might be useful.

Matt’s blog already has a good discussion going on this topic, so if you have
thoughts and ideas, add more over there.

Related reading

bouncy castle
google webspam
A simple blue and white graphic of a speedometer.
hp