If your website sometimes has minor short-term duplicate content issues, how can you avoid being caught in Google's duplicate content filter? That's the topic of the latest Webmaster Help video from Google's Distinguished Engineer Matt Cutts.
This video doesn't address some of the usual big duplicate content issues, such as when someone steals an entire website, or publishes articles that have appeared on hundreds of other sites. The type of duplicate content that Cutts discusses in this video is when there is a legitimate reason why content is either identical or very similar to content that appears already on another page on the site.
"I'm assuming this is white hat, high-quality news publisher, and you have short-term duplicate content, that you really don't want to have but maybe there is a shooting or something you have breaking news," Cutts said. "One thing you can do is use the rel=canonical tag because even if you have multiple copies of a short story, you'll be dividing the PageRank between those multiple stories. If it's all on the same topic, or all the stories are on the exact same incident, and they are very close to duplicate, I would use rel=canonical to point to one home URL."
Using the rel=canonical tag in this instance would allow people to read individual stories that have pretty similar content, but is something like a news story that is updating multiple times per day as more breaking news is announced.
When someone is searching for this story in the future, you want to ensure that the searcher can find the one page that perhaps gives the best overview of the entire story, rather than having links to a dozen or more pages with very similar content about same story. So in this instance you want to make sure the use rel=canonical to point to the page you see all gives the best information about the story.
It's also worth noting that Cutts is suggesting you use rel=canonical not only in situations where the content is duplicated, but also in situations where it is fairly similar, even if it isn't identical. This is definitely a bit of a change from where duplicate content is considered to be identical content either for an entire article or for paragraphs or sections within a webpage.
"That will help clarify that, OK, I might short-term have some duplicate stuff while this is breaking news, but after stuff gets all cleaned up, this is the standard spot, the preferred location on the web where I'd like this information to sit," he said. "If you do that and you don't have huge amounts of duplicate text all over your site, then you should be in pretty good shape as far as avoiding any sort of spam action or anything along those lines."
So if you're concerned about duplicate content, using rel=canonical in these kinds of situations will help keep your websites in Google's good graces.