The “Meet the Crawler” sessions at SES are always high on my personal “not to be missed” list. The session at San Jose was as ever interesting and informative. One take-away was that both Google and Yahoo! treat server codes 404 and 410 as though they are the same. They are not the same; otherwise there would be no need for two codes. It was apparent from the discussion during the session that these codes are so often confused and misused that it is less problematic for the search engines to treat them alike.
So, what’s the difference, and what is the correct usage of each? Officially, the 4xx series of codes are used to indicate that the there is an error at the client side. This is often the result of a mistyped URL. In these instances, the server delivers the familiar 404 error. The 410 code on the other hand signals that the resource has been intentionally removed.
It is possible to trigger a 404 message by simply typing in the domain followed by random numbers and letters, such as www.mysite.com/fudsec. This behavior will not generate a 410 response. The 410 code is a code of intention, not just an error message. The server must be told to deliver the 410 code for the It is used to signal that a URL in fact existed at the server at the location requested but has been removed, stricken, eliminated, gone, don’t ask for it again gone.
When Google and Yahoo! encounter a 404, they do not immediately remove the page from the index but rather will revisit multiple times before taking the drastic step of dropping the page from the index. By treating the 404 and the 410 similarly, the search engines make it more difficult to cause the accidental removal of pages. As the discussion during the Q & A at SES indicated, search marketers should be aware that delivering a 410 code will not result in more rapid removal of these pages from the search indices and will not prevent Yahoo! and Google from re-crawling the URLs.