SEO News
Search

Why You Should Prevent Certain Pages From Being Indexed

burgerhout-mathieu
by , Comments

Every website has more important pages and less important pages. Unimportant pages are an unavoidable part of the hierarchy or structure of your website. It's only harmful when you don't recognize these kinds of pages.

Here's how you can determine which of your pages are unimportant and prevent them from being indexed. The result will be a clean, mean, money-making online conversion machine.

Common Unimportant Pages

Nine out of 10 times, unimportant pages are the source of duplicate content.

A lot of features aimed at visitors create and generate a lot of unimportant pages. Some commonly used features that are known for generating new and unimportant pages are:

  • Faceted/filtered navigation
  • Print page
  • Tell a friend
  • Internal search

If any of these features are on your website, the chances are high that there are unimportant pages indexed. Be aware of this and check if any of those "feature" pages can be indexed.

Check with specialized search queries if any of those pages are already indexed. For example, to check if the "tell a friend" option is indexed for our website, we'd execute the following query in Google (locally for better results):

Tell a Friend Search

The extension of the query "inurl" asks the index to only show results for URLs containing "tell-a-friend." With this query, you'll be shown all URLs with "tell-a-friend" in it. Definitely unimportant pages, because there is only an input field for an e-mail address and a send button.

Unimportant for your visitors? No, not at all. This is a convenient, helpful feature for your visitors.

Unimportant for search engines? Yes, sir!

Tell a Friend Results

In our case, more than 7,000 known "tell-a-friend" pages were discovered by Google. This is correct, because our website has roughly 7,000 product pages.

These extra 7,000 pages are draining weight from the 7,000 money-making product pages. The "tell a friend" option is on every product page!

Every product gives the "tell a friend" option an unique ID, which generates a new URL. So, 7,000 products generate at least the same amount of "tell a friend" pages.

Crawler

How to Exclude

How do we exclude this kind of pages? There are several options to exclude pages from being indexed, all with their pros and cons. Prevention of indexation can be done by:

  • JavaScript
  • Meta "robot" tags
  • Robots.txt

The best way is to not make the pages available for search engines at all. The proper way would be to not link to the pages. This can be done using JavaScript.

Because search engines don't execute (a lot of) JavaScript, this is a good method to make the pages unavailable. Especially for faceted navigation, this is the most convenient method to prevent all those extra option pages from being indexed, which are a diversion of the original page anyway (i.e., duplicate).

The far easier option is to exclude pages using meta instructions. This option is easy to implement and fairly effective. Add the meta tag "robots" to pages you don't want to have indexed. To prevent it from being indexed, you add the "noindex" instruction to the meta.

The instruction can be extended with "nofollow" or "follow." Add the instruction "follow" to the meta when the pages are already indexed (like the example above). This will prevent the pages from being indexed, but will transfer link juice to all links on the (not indexed) page. A small amount of link juice will be passed. Of course, new pages won't be indexed at all and therefore will pass zero link juice.

Another effective method is to set exclusion rules in robots.txt to exclude pages or even folders. These rules are followed strictly by all search engines. If you add a rule where you disallow the indexation of all "tell-a-friend.php" pages, search engines will leave the pages alone.

Although this is effective and strict, Google nowadays shows the disallowed pages as "crawled" in the index. Crawled pages can be recognized by a result only showing you the URL as a title. Crawled pages won't be shown in the search results.

Another disadvantage of excluding pages from the robots.txt: link juice is still passed to your unimportant pages because the links are still there.

Crawler

No nofollow?

Touchy subject, but a "nofollow" isn't effective enough (anymore) to exclude pages from the index, like our "tell a friend" pages. If anyone can prove me wrong, please do. Isn't it weird to basically tell Google that an internal page on your site isn't trustworthy or (possibly) spam?

Conclusion

So, is our website owner stuck with 7,000 indexed "tell a friend" pages? Well, for a while at least. The pages will dissolve over time.

With the right exclusion rules, there will be no new "unimportant" pages indexed and the website's owner can put more weight on their money-making pages and keep the index clean of unimportant pages.


The Original Search Marketing Event is Back!
SES DenverSES Denver (Oct 16) offers an intense day of learning all the critical aspects of search engine optimization (SEO) and paid search advertising (PPC). The mission of SES remains the same as it did from the start - to help you master being found on search engines. Early Bird rates extended through Sept 19. Register today!

Recommend this story

comments powered by Disqus