Among SEO practitioners, duplicate content has been a major topic that keeps returning. And advice has typically centered on the need to develop unique content for your website. With Google's Panda release, however, there's a need to redefine how we think about unique content.
Previously, the emphasis had been on ensuring that articles not be copies of other articles. The basic reason for this is that the search engines don't want to have a results page that shows the same result over and over again. For example, a results page that looks like this:
Users want to see varying results.
Consider the scenario where a user clicks on the first result, looks at it, and then decides he don't like that result, and then clicks on the second result. Chances are not good that he is going to like that one better, right? If he then clicks on the third result, he will likely scream. The search engines go to great lengths to prevent this from happening (note that I used a specialized query to force duplicate content to happen for the above screen shot).
Google's Panda algorithm update hit the world on February 23, 2011. There has been lots of speculation on the algorithms implemented, and the best way to recover from Panda. SEW Associate Editor Danny Goodwin reported on May 19 that a couple of sites were indicating a partial recovery from Panda.
Of particular interest was OneWayFurniture.com. It was using manufacturer's product descriptions for the products it sold. It made three big changes including the following:
Hiring four copywriters to write original, SEO-friendly product descriptions. Websites must decide what keywords they want to rank for, and write copy around those keywords that is unique, topic-focused, well-written, and checked for spelling and grammar.
This seems to indicate that a site with massive duplicate content might get tagged by Panda. However, I sat on the duplicate content panel at SES New York with Google's Tiffany Oberoi, who is a software engineer working on the search quality team, and she was adamant that Panda was not about duplicate content.
In my opinion there are different algorithms running here, one of which is the traditional duplicate content filters that Google uses, and a series of algorithms in Panda targeted at identifying low quality sites. This Panda-based approach may include techniques for identifying content that includes unique new information, not whether or not it is duplicate in nature.
A New Look at Content Quality
Consider a site that has an article about frogs. The meat of the article makes four main points in a bulleted list as follows:
- Frogs are green
- Frogs are not toads
- Frogs like to swim and jump
- Frogs live in water
There are some other sentences, but relatively speaking these are the four major points that the article makes.
Consider a second article on another site that is also about frogs. This article also makes four main observations about frogs in a paragraph like this:
"The habitat of frogs is water, and they are green in color. People sometimes confuse them with toads, however they are quite different. To move around frogs typically jump or swim."
OK, so I'm not an expert on frogs, but the point is that the two articles are different, but offer no distinct value from one another. If a user clicks on the first result (with the bulleted list), does not find what he wants, is he going to be any happier with the second result? Of course not.
So if you're a search quality engineer at Google or Bing, and you can devise an algorithm to detect this type of scenario would you use it? Of course you would. It could really help improve the quality of search, and that is a pretty compelling thing to be able to do.
I can't say whether this type of measurement was used in Panda and that it not really the point of this column. What I've defined here is a search quality problem. It's in the interest of the search engines to address it.
Publishers need to stay very focused on how their websites can offer unique value from their competitors. To give another example, the world does not need 15,200 articles on how to apply for a mortgage:
Deriving value from publishing a website will increasingly depend on delivering unique new value. Saying the same things in different words is simply not enough.
The search engines will figure out how to find the three or four sources that users trust the most, or respond to the best, for queries like "how to apply for a mortgage." These few sites will be shown because of their authority even though they basically make the same points, and everyone else is going to need to substantially differentiate what they have to say to have a shot.