This panel focused on various tactics to control duplicate content for SEO purposes. It's always a great topic because, while the general guidelines are clear, solving duplicate content efficiently always depends on the specific situation. There is lots of nuance involved in this topic.
Moderator: Adam Audette
Shari is up first, and she describes the 6 primary methods websites can use to give the search engines directives and/or hints as to preferred URL canonicalization. (Canonicalization in this context is the consistent representation of 1 primary URL for every HTML document.) These are:
- robots.txt directives
- meta robots tags
- XML sitemaps
- nofollow link attribute
- rel=canonical meta tag
- internal linking
Shari hammers home the importance of consistency with everything a website says and does. For example, if a web page is robots excluded, don't include it in an XML sitemap, and don't have a rel=canonical tag on the page. When there is a consistent 'scent' or signal to the engines, canonicalization is much easier. Don't give signals to the search engines that are in conflict.
Another important topic from Shari is how duplicate content filters happen at 3 different phases. There are crawl time filters, index time filters, and query time filters.
Next is Maile who gives a great overview of the Google perspective on duplicate content. She assures the audience that having duplicate content is not considered a penalty by Google, it is in fact a filter. Maile talks about crawl overhead and how curing duplicate content can greatly improve the crawl experience. In a great example of how sites struggle with duplicate content, Maile points out the Google Books site and shows a number of duplicate URLs for the primary site. Yep, duplicate content is such a common problem that even Google has issues with it.
Maile encourages the use of rel=canonical, which is fully supported by the engines. Google is the only one that also supports cross-domain use of rel=canonical.
Maile notes that over 60,000 sites are using the parameter removal option in Google Webmaster Tools to help with duplicate content. She explains how this can be a useful step for helping googlebot understand what URL parameters aren't necessary and can be safely ignored during the crawl.
One topic that wasn't covered was Google's preference for webmasters to not exclude duplicate content with robots.txt or a meta robots tag.
Finally it's Anthony's turn and he gives an interesting case study of duplicate content struggles experienced on AOL properties. Popeater.com is the primary example he uses, showing how the blog articles are featured in Google News but only syndication partners are featured in traditional listings, not the original article on Popeater.com. Why is this happening? he asks.
Anthony explains that they have a selection of Guidelines for syndication partners, and also a selection of Preferences, that are shared. It's taken case by case, but the goal is ultimately to give the syndication partners a limited version of the original content, and to feature prominent links back to the source content.
During Q&A, topics include everything from handling tag pages, to dealing with hundreds of 'salesperson' sites with the same content, to how to handle a domain just purchased that is in a bad neighborhood with dirty links.