Every website has more important pages and less important pages. Unimportant pages are an unavoidable part of the hierarchy or structure of your website. It's only harmful when you don't recognize these kinds of pages.
Here's how you can determine which of your pages are unimportant and prevent them from being indexed. The result will be a clean, mean, money-making online conversion machine.
Common Unimportant Pages
Nine out of 10 times, unimportant pages are the source of duplicate content.
A lot of features aimed at visitors create and generate a lot of unimportant pages. Some commonly used features that are known for generating new and unimportant pages are:
- Faceted/filtered navigation
- Print page
- Tell a friend
- Internal search
If any of these features are on your website, the chances are high that there are unimportant pages indexed. Be aware of this and check if any of those "feature" pages can be indexed.
Check with specialized search queries if any of those pages are already indexed. For example, to check if the "tell a friend" option is indexed for our website, we'd execute the following query in Google (locally for better results):
The extension of the query "inurl" asks the index to only show results for URLs containing "tell-a-friend." With this query, you'll be shown all URLs with "tell-a-friend" in it. Definitely unimportant pages, because there is only an input field for an e-mail address and a send button.
Unimportant for your visitors? No, not at all. This is a convenient, helpful feature for your visitors.
Unimportant for search engines? Yes, sir!
In our case, more than 7,000 known "tell-a-friend" pages were discovered by Google. This is correct, because our website has roughly 7,000 product pages.
These extra 7,000 pages are draining weight from the 7,000 money-making product pages. The "tell a friend" option is on every product page!
Every product gives the "tell a friend" option an unique ID, which generates a new URL. So, 7,000 products generate at least the same amount of "tell a friend" pages.
How to Exclude
How do we exclude this kind of pages? There are several options to exclude pages from being indexed, all with their pros and cons. Prevention of indexation can be done by:
- Meta "robot" tags
The far easier option is to exclude pages using meta instructions. This option is easy to implement and fairly effective. Add the meta tag "robots" to pages you don't want to have indexed. To prevent it from being indexed, you add the "noindex" instruction to the meta.
The instruction can be extended with "nofollow" or "follow." Add the instruction "follow" to the meta when the pages are already indexed (like the example above). This will prevent the pages from being indexed, but will transfer link juice to all links on the (not indexed) page. A small amount of link juice will be passed. Of course, new pages won't be indexed at all and therefore will pass zero link juice.
Another effective method is to set exclusion rules in robots.txt to exclude pages or even folders. These rules are followed strictly by all search engines. If you add a rule where you disallow the indexation of all "tell-a-friend.php" pages, search engines will leave the pages alone.
Although this is effective and strict, Google nowadays shows the disallowed pages as "crawled" in the index. Crawled pages can be recognized by a result only showing you the URL as a title. Crawled pages won't be shown in the search results.
Another disadvantage of excluding pages from the robots.txt: link juice is still passed to your unimportant pages because the links are still there.
Touchy subject, but a "nofollow" isn't effective enough (anymore) to exclude pages from the index, like our "tell a friend" pages. If anyone can prove me wrong, please do. Isn't it weird to basically tell Google that an internal page on your site isn't trustworthy or (possibly) spam?
So, is our website owner stuck with 7,000 indexed "tell a friend" pages? Well, for a while at least. The pages will dissolve over time.
With the right exclusion rules, there will be no new "unimportant" pages indexed and the website's owner can put more weight on their money-making pages and keep the index clean of unimportant pages.