Many webmasters assume that identifying on-site content duplication is a no-brainer. Assuming that you can simply cruise through your site and identify identical pages isn't a good way to manage similar or identical content on your site.
What happens when you have thousands of pages on your site or several template pages (e.g., company location pages, e-commerce sections) sharing content, such as product pages and clearance sections?
Identifying On-Site Content Duplication
It's not just copy that you need to be concerned with. Have you checked your title elements and meta descriptions lately?
While you won't necessarily be dinged for duplicate content by the bots, you're definitely not helping anything.
Cannibalizing terms in the title elements is a great way to confuse search engines about the keyword focus of individual pages. Identical meta descriptions indicate you don't give a darn about informing and persuading a potential site visitor in the SERPs.
Tool to Use: Google Webmaster Tools >>Diagnostics>>HTML Suggestions
This tool will show you duplicate meta descriptions across your site as well as duplicate title elements. Are there are duplicate title elements present? If so, you're likely to have duplicate pages that are generated through internal content sharing and dynamic page creation.
Not all duplicate content on-site has to be identical to offend a search engine crawler. Any page similarity over 70 percent in its entirety can raise red flags are impede ranking success. As I mentioned above, this is seen many times in multiple location pages within a site or different color product pages for instance on e-commerce sites.
Tool to Use: WebConfs Similar Page Checker
This handy tool allows you to input two URLs from your site and assesses the percentage of similarity between the two. While all site pages have some similarity due to navigation and template layout, you should strive to keep this percentage as low as possible, or at least under the 50 percent mark.
Managing On-Site Content Duplication
Once you've identified areas of your site prone to excessive similarity or page duplication, you can address these issues.
The robots.txt file is a great place to start. Here you can exclude duplicating pages such as blog archive folders, blog category folders, dynamic URL parameters that duplicate pages, and such. This is the best way to keep bots away from your duplicate pages.
A tactic closely related to this is the no-index meta tag, which allows you the more functional ability to let crawlers follow links on pages without indexing a respective page. If you have dynamic URLs issues, you're advised to notify Google of this in Webmaster Tools as you can instruct them to ignore certain parameter fields.
Another way to address similar pages is to decide which of these pages you deem the most important for a SERP ranking (i.e., possesses the most/best inbound links, is the main product page, etc.). All other similar pages can feature a canonical meta tag indicating that you realize you have similar content but you wish for search engines to focus on your desired page.
Moving forward, you can use tools to monitor duplicate content such as setting up a campaign in SEOmoz Tools to continually assess content duplicity on your site as well as maintaining a monitoring presence within your Google Webmaster Tools account.
Beyond managing the similar content on your site you didn't know you had, it's always a best practice to make your site content as unique and informative as possible. Your visitors will like you for it, and, oh yeah, the search engines will, too.
Join us for SES Chicago 2010, the Leading Search & Social Marketing Event, taking place October 18-22! The conference offers 70+ sessions on topics including PPC management, keyword research, SEO, social media, local, mobile, link building, duplicate content, multiple site issues, video optimization, site optimization, usability and more.