Common Problems with 404 Error Pages

SEO covers many aspects of a Web site, and recommendations can usually be grouped into one of three buckets: technical, content, or linking. It may surprise you to learn that the way a Web site handles 404 error pages can affect search engine rankings. Although some would argue how much negative weighting is placed on sites with inconsistent or custom 404 error pages.

In recent weeks, I’ve seen several mishandled 404s, but one theme seems to return “200 OK” codes to search engine crawlers for 404 pages. The sites doing this are pretty significant sites we’ve looked to engage from an SEO perspective.

In a couple cases through our monitoring of Webmaster Tools, we’ve even discovered client sites with this issue. The last thing you need in the Google index is an error page. But even worse are multiple URLs that were mistyped being returned as “OK.”

Another interesting trend I’ve seen is dynamically generated 301 or 302 redirects that send users to either the home page or a custom error page when someone mistypes the URL from a domain into the browser. For example, if I typed, I’m led to a nice custom 404 error page that offers users a number of choices to find the right page. The key is to understand exactly what kind of response code it returned to the server.

In order to check response codes, I like to use the tool. If I place the mistyped URL above into the box at this page, I get a detailed list of how the page is seen from a crawler’s perspective. The call to the server generates a 301 redirect to the custom 404 page described above, but most important, it returns a “Server Response” of 404 not found.

Because the user is led to a page with more information, they will likely be happy and continue to navigate in order to find what they want. The key is that by the server response code being a 404, the URL won’t be indexed in the search engines and will simply “disappear.”

Now if the response code was “200 OK,” which is a mistake developers sometimes make, a new URL could be indexed (/doesnotexist) with possible duplicate content. The home page is often used to redirect people in the event of a dead or non-existent page.

From a user experience standpoint, it’s arguable if that’s the right choice or if the user should be sent to an error page. I’m sure you can find many people on both sides of that fence. From an SEO perspective, you can end up with duplicate versions of your home page in the search engine indexes, but with bogus URLs.

Tips on 404 Best Practices

Mark Jackson gave us a nice round up of various problems with hosting issues earlier this month.

I also asked Craig Geis, our SEO technical lead who also maintains the SEO Tech Blog, to provide some insights on how improperly handled 404s can be negative from an SEO perspective. He has looked deeply into this issue, and provides some general best practices to follow when deciding how to handle pages that can’t be found.

These aren’t the only ways to deal with this problem. However, from an SEO perspective it would be wise for developers/designers of many large Web sites to heed Geis’s advice:

Without a custom 404 error page, the visitor — human or robot — is left with only two courses of action: to abandon their search or click the back button. Search engines can reduce rankings due to server errors and broken pages. Simple errors such as “404 page not found” in large quantities can make the search engines believe that a site isn’t complete or is under construction and, as a result, they may determine that the site isn’t worthy of strong search engine rankings.

When a nonexistent page is requested from the server, the server should respond with a special “HTTP Status” header value of “404 Not Found,” which may also be followed by custom error-page body content. Incorrectly configured Web-servers that respond with a status header value of “200” (or any other erroneous value) are exposed to significant risk with respect to search engines’ “duplicate content penalties.” This is because the identical content (in this case, the error page content) would be available under a potentially infinite number of URLs.

Custom 404 pages serve several important purposes. First, they return the correct code to the users and to search engine spiders, informing the visitors that the page they were seeking wasn’t found. Second, custom 404 pages present visitors with options about what to do next. Without a custom 404 error page, the visitor — human or robot — is left with only two courses of action: to abandon their search or click the back button. Neither of these are a satisfactory response to an error.

Geis provides specific recommendations to several clients on this issue. So, if you have this problem with your sites, don’t feel like you’re alone. Take care of it because it could potentially be damaging in the long term from a user experience and SEO perspective.

Frank Watson Fires Back

Having just spent the last few days in Munich at SEOktoberfest, I’ve learned some serious black hat methodology — and ways that you can screw with your competitors. Your site’s responses to mistyped pages or to pages that no longer exist should have direction.

Creating sole response pages may be fair, but you really should be more exact in the direction. A 404 should be a redirect to appropriate content, or an “I Give Up” page.

The major corporations are a different story. They screw up the pages, but have things in place to deal with mistakes — not right out of the gate, but close enough to mark when things need to be changed. It is realistically also true that most places do little to recognize what’s going on.

SEOktoberfest was an awesome event. If you missed it, start making comments on the forum so others can be alerted to a topic. Rock on Aussies!