The canonical tag (rel="canonical") is an essential tool in the search engine optimization (SEO) toolbox. It is often a better solution that a 301 redirect in cleaning up duplicate content issues.
So let's explore what those issues are and what the tag does.
Why Duplicate Content is a Problem
Duplicate content occurs when Google realizes it has two or more documents that essentially have the same content. As a result, the algorithm will exclude duplicate documents from the main index because search engineers at Google know that users don't want the same page appearing multiple times in search results (and Google doesn't want to waste extra processing power on millions of duplicate documents, they have enough to do already).
The problem is that when the documents are on different URLs, only one of those URLs is going to be able to show in the results pages for that content. The URL selected by the engines may or may not be the optimal URL from a ranking perspective or, in some cases, may not even be the original owner's URL.
So, while Google hasn't penalized anyone specifically, the result isn't good for the content owner. Typically the canonical tag has a limited value here because most people who are scraping or even syndicating your content may not be inclined to add the tag.
But on the other hand, if the duplicate content is caused by your own site, the problem becomes one of link connectivity. Whatever links point to the duplicate page (or pages) that get excluded will no longer benefit the original. Thus, you're splitting your link connectivity metrics between two or more pages and only one is benefiting, diluting the overall effectiveness and impact of your internal linking.
Obviously this isn't a best case scenario from an SEO perspective.
This doesn't result in a penalty being applied to your site, but the result seems the same. It's worth noting that the only time penalties are applied to sites by search engines is when the duplicate content is created intentionally to game the system.
About the rel="canonical" Tag
The tag has the following syntax: . It tells Google and Bing that this page, regardless of the URL used to access it, is actually the one specified in the tag. Therefore, all associated link metrics (and in our testing content), is applied to this page. Google is clear about the intended use of this tag:"Yes. The rel="canonical" attribute should be used only to specify the preferred version of many pages with identical content (although minor differences, such as sort order, are okay)."
In other words, use this only to clean up duplicate content. If you use it for any other purpose, you will in effect be guilty of search engine spam and therefore subject to the normal consequences (ask JCPenney what that feels like).
Duplicate Content Issues
Implement the tag appropriately on every page of your website. Doing so will immediately insulate you from the following duplicate content issues, many of which are caused inadvertently by CMS systems:
- Tracking Codes: A lot of technologies require that you add a tracking variable on the end of URLs that either link to your site or link internally. The format is similar to www.example.com?tracking-variable or might look like www.example.com/example.htm?tracking-code. The problem here is that search engines treat every URL that is different by even one character (including capitalization) as a distinct URL.
Although Google and Bing have automated technology to figure these kinds of discrepancies out (Google's is much more sophisticated), both engines still make a ton of mistakes with these kinds of issues. It is also interesting to note that sometimes people link to your website with tracking codes for their own link tracking purposes. The canonical tag ensures that you retain the link connectivity benefit of these links.
- Inconsistent URLs: Any two URLs that are different by even one character are treated as unique URLs. There a number of cases when inconsistent URL strings can lead to duplicate content. These include inconsistent capitalization, order of parameters, extra parameters that don't affect page content (adding language=English for example) or pagination where page=1. Many of these are often caused by CMS systems.
- Pagination Issues: Where there are multiple pages of a site where basically the content is the same. Either the products are similar but only differ by color or other options, or the pages are produced by different sort orders like "by price" or some other variable that essentially produces the same content.
- WWW Versus Non-www: This isn't nearly the problem that it used to be -- most of the time Google gets it right. But it still happens now and again where a site will get indexed under both versions (example.com and www.example.com), which results in half the site indexed under one version and half under the other. You can specify your preferred version in Webmaster Tools but the canonical tag will also take care of the issue.
- Where You Can't Implement a 301 Redirect: People often have a hard time implementing 301 redirects based on server permissions or hosting technology issues. This is another option to more or less get the same effect, the difference being that the original page will still exist.
It should be noted that according to Google, the canonical tag is a suggestion and not a mandate: This new option lets site owners suggest the version of a page that Google should treat as canonical. Google will take this into account -- in conjunction with other signals -- when determining which URL sets contain identical content and calculating the most relevant of these pages to display in search results.
By implementing rel="canonical" tags for each page of your site and specifying the exact URL you want to represent each page of your content, you will go a long way in preventing the most common variations of duplicate content, and by extension, give yourself the best opportunity for excellent search engine rankings.
Save up to $400! Register now for SES New York 2011, the Leading Search & Social Marketing Event, taking place March 21-25. SES New York will be packed with 70+ sessions, multiple keynotes, 100+ exhibitors, networking events, and parties. Learn about PPC management, keyword research, search engine optimization (SEO), social media, local, mobile, link building, duplicate content, multiple site issues, video optimization, site optimization, usability, and more. Early bird rates expire March 4.
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!