If you're involved with SEO in any way, you've probably heard about duplicate content. If you're not exactly sure what it is and how it affects your SEO efforts, then this article is for you. This topic can be rather technical and have some advanced features, but I'll try to keep it basic and won't get into technical details here.
What is Duplicate Content?
In the basic sense, duplicate content is when two or more Web sites have the same content on their site. This isn't just the same subject or headers, but the exact same content word for word or maybe almost word for word.
This may happen for several reasons. You may have written an article and another site or blog picked up the article and posted it word for word.
Another situation is you might have multiple sites with different domains but have similar content. This is sometimes true for sites in different countries. Sometimes there are legitimate reasons for having the same content. Still, it's good to understand the pitfalls.
Why is Duplicate Content a Problem?
The goal of the search engine is to deliver the best value for a given search term or phrase. The more this happens, the more searchers will continue to use that search engine.
The intent for the search engine is to avoid serving up many of the exact same Web pages in the search results. Thus, creating confusion for the searcher and delivering a poor searcher experience. So they attempt to filter all of the duplicate content and choose one based on certain criteria and then serve it up.
The problem is that you're in jeopardy of your page not showing up in favor of another page with the same content. There is speculation about a penalty for duplicate content, but I don't think it's so much a penalty as missing an opportunity to show up in the SERPs.
How Search Engines Deal with Duplicate Content
Search engines send out a bot or program to surf the Internet and collect all of the content it finds. This content is indexed and placed into a database.
During this process, the content is compared against other duplicate content. Then an attempt is made to determine the original. Some clues that help it determine this are:
- How trusted is the domain?
- Are there links on one that point back to an original?
- Or where do most of the links point to?
- Where is the first place Google found the content?
- Has any of the content appears to have been "scraped" or repurposed?
One is then picked and used and the others are discarded. This list should also give you some ideas on what you can combat duplicate content issues.
What Can You do to Avoid Duplicate Content Issues
Now that you have a good idea what duplicate content is and how it's dealt with, let's look at what you can do to avoid the pitfalls. First, duplicate content issues don't have anything to do with your site HTML code, only your page content.
Another way to deal with this issue is by using a canonical tag. A canonical page is basically an authoritative page among a group of pages that have similar content.
Also, Google recently posted an article on ways to handle legitimate cross-domain content duplication. They announced the support of a link element and other tips for handling the problem. Basically, Google recognizes there are some legitimate uses for duplicate content and they want to help site owners with solutions.
As I mentioned earlier, being a basics article, I won't get into any technical details. You may, however, need some technical help to dig further into this topic and plan to implement a solution.
Please feel free to share any lessons learned or other best practices with avoiding duplicate content issues.
Marketers Rejoice! ClickZ has launched ClickZ Live, an educational series to bring you innovative online marketing strategies and techniques. Learn to construct and successfully execute multi-channel marketing campaigns, plus identify key metrics and translate them into actionable plans.
Thursday, July 18: ClickZ Live will be in Vancouver, BC. Register before July 1 to save $100!