Duplicate content is always a concern for webmasters. Whether it’s a website stealing content from another site, or perhaps a website that hasn’t taken an active role in ensuring they get great unique quality content on their site, being duplicated out of the Google index is a problem.
In the latest webmaster help video from Google’s Matt Cutts, he addresses how Google handles duplicate content, and when it can negatively impact your search rankings.
Cutts started by explaining what duplicate content is and why duplicate content isn’t always a problem, especially when it comes to quoting parts of other web pages.
It’s important to realize that if you look at content on the web, something like 25 or 30 percent of all of the web’s content is duplicate content. … People will quote a paragraph of a blog and then link to the blog, that sort of thing. So it’s not the case that every single time there’s duplicate content it’s spam, and if we made that assumption the changes that happened as a result would end up probably hurting our search quality rather than helping our search quality.
For several years, Google’s stance has been that they try to find the originating source and give that result the top billing, so to speak. After all, Google doesn’t want to serve up masses of identical pages to a searcher because it doesn’t provide a very good user experience if they click on one page, didn’t find what they’re looking for, and then go back and click the next result only to discover the identical page, just merely on a different site.
Google looks for duplicate content and where we can find it, we often try to group it all together and treat it as of it’s just one piece of content. So most of the time, suppose we’re starting to return a set of search results and we’ve got two pages that are actually kind of identical. Typically we would say, “OK, rather than show both of those pages since they’re duplicates, let’s just show one of those pages and we’ll crowd the other result out,” and then if you get to the bottom of the search results and you really want to do an exhaustive search, you can change the filtering so that you can say, “OK, I want to see every single page” and then you’d see that other page. But for the most part, duplicate content isn’t really treated as spam. It’s just treated as something we need to cluster appropriately and we need to make sure that it ranks correctly, but duplicate content does happen.
Next, Cutts tackles the issue of where duplicate content is spam, such as websites that have scraped content off the original websites or website owner suggests republish a lot of “free articles” that are republished on masses of other websites. These types of sites have the biggest problem with duplicate content because they merely copy content created on other websites.
It’s certainly the case that if you do nothing but duplicate content, and you are doing in an abusive, deceptive, malicious, or a manipulative way, we do reserve the right to take action on spam. So someone on Twitter was asking a question about “how can I do an RSS auto blog to a blog site and not have that be viewed as spam,” and the problem is that if you are automatically generating stuff that is coming from nothing but an RSS feed, you’re not adding a lot of value, so that duplicate content might be a little bit more likely to be viewed as spam.
There are also cases where businesses might legitimately end up with duplicate content that won’t necessarily viewed as spam. In some cases, websites end up with duplicate content for usability reasons, rather than SEO. For the most part those websites shouldn’t worry either
But if you’re just making a regular website and you’re worried about whether you’d have something on the .com and the .co.uk, or you might have two versions of your terms and conditions, an older version and a newer version, that sort of duplicate content happens all the time on the web and I really wouldn’t get stressed out about the notion that you might have a little bit of duplicate content.
Cutts does caution against local directory types of websites that list masses of cities but serve up empty listings with no true content about what the user might be looking for, as well as sites that create individual pages for every neighborhood they service, even though the content is the same as what’s on main city web page.
As long as you’re not trying to massively copy for every city in every state in the entire United States, show the same boilerplate text which is, “no dentists found in this city either,” for the most part you should be in very good shape not have anything to worry about.
Bottom line: as long as your duplicate content is there for legitimate reasons (e.g., you’re quoting another website or you have things like two versions of terms and conditions), you really shouldn’t be concerned about duplicate content. However, Google certainly can and will take action against sites utilizing duplicate content in a spammy fashion, because they aren’t adding value to the search results.