How to check for duplicate content to improve your site’s SEO

Improving your site's SEO by checking duplicate content

Publishing original content to your website is, of course, critical for building your audience and boosting your SEO.

The benefits of unique and original content are twofold:

  1. Original content delivers a superior user experience.
  2. Original content helps ensure that search engines aren’t forced to choose between multiple pages of yours that have the same content.

However, when content is duplicated either accidentally or on purpose, search engines will not be duped and may penalize a site with lower search rankings accordingly. Unfortunately, many businesses often publish repeated content without being aware that they’re doing so. This is why auditing your site with a duplicate content checker is so valuable in helping sites to recognize and replace such content as necessary.

This article will help you better understand what is considered duplicate content, and steps you can take to make sure it doesn’t hamper your SEO efforts.

How does Google define “duplicate content”?

Duplicate content is described by Google as content “within or across domains that either completely matches other content or are appreciably similar”. Content fitting this description can be repeated either on more than one page within your site, or across different websites. Common places where this duplicate content might be hiding include duplicated copy across landing pages or blog posts, or harder-to-detect areas such as meta descriptions that are repeated in a webpage’s code. Duplicate content can be produced erroneously in a number of ways, from simply reposting existing content by mistake to allowing the same page content to be accessible via multiple URLs.

When visitors come to your page and begin reading what seems to be newly posted content only to realize they’ve read it before, that experience can reduce their trust in your site and likeliness that they’ll seek out your content in the future. Search engines have an equally confusing experience when faced with multiple pages with similar or identical content and often respond to the challenge by assigning lower search rankings across the board.

At the same time, there are sites that intentionally duplicate content for malicious purposes, scraping content from other sites that don’t belong to them or duplicating content known to deliver successful SEO in an attempt to game search engine algorithms. However, most commonly, duplicated content is simply published by mistake. There are also scenarios where republishing existing content is acceptable, such as guest blogs, syndicated content, intentional variations on the copy, and more. These techniques should only be used in tandem with best practices that help search engines understand that this content is being republished on purpose (described below).

SEO audit report that helps spot and rectify duplicate content

Source: Alexa.com SEO Audit

An automated duplicate content checker tool can quickly and easily help you determine where such content exists on your site, even if hidden in the site code. Such tools should display each URL and meta description containing duplicate content so that you can methodically perform the work of addressing these issues. While the most obvious practice is to either remove repeated content or add original copy as a replacement, there are several other approaches you might find valuable.

How to check for duplicate content

1. Using the rel=canonical <link> tag

These tags can tell search engines which specific URL should be viewed as the master copy of a page, thus solving any duplicate content confusion from the search engines’ standpoint.

2. Using 301 redirects

These offer a simple and search engine-friendly method of sending visitors to the correct URL when a duplicate page needs to be removed.

3. Using the “noindex” meta tags

These will simply tell search engines not to index pages, which can be advantageous in certain circumstances.

4. Using Google’s URL Parameters tool

This tool helps you tell Google not to crawl pages with specific parameters. This might be a good solution if your site uses parameters as a way to deliver content to the visitor that is mostly the same content with minor changes (i.e. headline changes, color changes, etc). This tool makes it simple to let Google know that your duplicated content is intentional and should not be considered for SEO purposes.

Example of resolving duplication of meta tag descriptions

Source: Alexa.com SEO Audit

By actively checking your site for duplicated content and addressing any issues satisfactorily, you can improve not only the search rankings of your site’s pages but also make sure that your site visitors are directed to fresh content that keeps them coming back for more.

Got any effective tips of how you deal with on-site content duplication? Share them in the comments.

Kim Kosaka is Director of Marketing at Alexa.com.

Further reading:

Related reading

Why an SEO should lead your website migration
Search engine results: The ten year evolution
Ten ways to pump out a stream of great content without burning out
Six HTTP status codes most critical to your SEO success