Proper SEO and the Robots.txt File

By Mark Jackson , August 12, 2008

When it comes to SEO, most people understand that a Web site must have content, "search engine friendly" site architecture/HTML, and meta data -- i.e., title tags, meta description, and meta keywords tags.

But lately, I'm seeing a lot of "optimized" Web sites that have totally disregarded the robots.txt file. When optimizing a Web site, don't disregard the power of this little text file.

What is a Robots.txt File?

Simply put, if you go to domain.com/robots.txt, you should see a list of directories of the Web site that the site owner is asking the search engines to "skip" (or "disallow"). However, if you're not careful when editing a robots.txt file, you could be putting information in your robots.txt file that could really hurt your business.

There's tons of information about the robots.txt file available at the Web Robots Pages, including the proper usage of the disallow feature, and blocking "bad bots" from indexing your Web site.

The general rule of thumb is to make sure a robots.txt file exists at the root of your domain (e.g., domain.com/robots.txt). To exclude all robots from indexing part of your Web site, your robots.txt file would look something like this:

User-agent:
* Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

The above syntax would tell all robots not to index the /cgi-bin/, the /tmp/, and the /junk/ directories on your Web site.

Real Life Examples of Robots.txt Gone Wrong

I recently reviewed a Web site that had a good amount of content and several high quality backlinks. But, the Web site had virtually no presence in the SERPs. What happened? Well, the site's owner had included a disallow to "/". They were telling the search engine robots not to crawl any part of the Web site.

In another case, a SEO company edited the robots.txt file to disallow indexing of all parts of a Web site after the site's owner stopped paying the SEO company.

And just yesterday, I reviewed a company's Web site and noticed that several directories that were part of their former site were disallowed in their robots.txt file. The company should have set up a 301 permanent redirect to pass the value from the old Web pages on the site to the new pages instead of disallowing the search engines to index any of the old legacy pages. Thus, all of the value was lost.

Robots.txt Dos and Don'ts

There are many good reasons to stop the search engines from indexing certain directories on a Web site and allowing others for SEO purposes. Let's look at some examples.

Here's what you should do with robots.txt:

Here's what you should not do with robots.txt:

By taking a good look at your Web site's robots.txt file and making sure that the syntax is set up correctly, you'll avoid search engine ranking problems. By disallowing the search engines to index duplicate content on your Web site, you can potentially overcome duplicate content issues that might hurt your search engine rankings.

One last note: if you aren't sure whether you can do this correctly, please consult with a SEO specialist.

Join us for SES San Jose, August 18-22 at the San Jose Convention Center.

Back to Article