Search Engines Handle No Index Inconsistently

Matt Cutts has a nice illustrated survey of how various major search engines deal with the meta noindex tag in Handling noindex meta tags. He finds inconsistency, with this being the summary:

  • Google doesn't show the page in any way
  • Ask doesn't show the page in any way
  • MSN shows a url reference and Cached link, but no snippet. Clicking the cached link doesn't return anything.
  • Yahoo! shows a url reference and Cached link, but no snippet. Clicking on the cached link returns the cached page.

Interestingly, if you use a robots.txt file to ban indexing, in that case Google DOES show the page in some ways. Matt acknowledges this, but it still raises the question why Google operates differently when the intent of both mechanisms (explained here) is the same. I've commented in his blog on the issue as follows:

Why would Google want to treat meta noindex and robots.txt differently. They are both intended to do the same thing — keep pages out of an index. The only reason we have two options is simply because some people can't setup robots.txt files for their sites, which might be within the domains of others. However technically they are implemented, it seems like they should be treated the same way.

My gut tells me most webmasters would prefer that all the search engines not list any pages that use either a robots.txt or meta noindex command.

From a user perspective, I think the technique of showing a link to a site if you can learn about it another way is fine, such as being listed in the Open Directory or from links on the public web to those sites.

The Yahoo implementation of meta noindex is odd — why show a cached page. But I can see a hole here. They might not be actually indexing the page but still caching is since the specific noarchive tag isn't also being used:

Sounds like summit time! Not only would a standard on how meta robots and robots.txt be handy, but it would also be nice to know if blocking a page also inherently blocks caching.

A summit -- or consistent standards, is something the first person commenting on Matt's blog is calling for. If it happens, perhaps it could also be extended to feeds. & Bloglines Proposes Blog Search Exclusion Tag from us earlier this month covers a proposed standard from The robots are coming! The robots are coming! over at SEOmoz gives some brief examples of why this might be useful.

Matt's blog already has a good discussion going on this topic, so if you have thoughts and ideas, add more over there.