Chris Boggs over at the Search Engine Roundtable wrote an item named Which Came First: the Content or the Plagiarism? which discusses the challenge search engines face when it comes to determining the original source of a particular piece of content.
For example, the content I am writing right now may be picked up within a matter of seconds by another site that wants to "borrow" or steal the content. So now we have two (probably a lot more than two) sources with identical content. A search engine can say, hey, I found source A before I found source B with this particular content, so source A must be the original source. But if you think about that, since spiders don't work in real time, a search engine may visit the source that "borrowed" the content prior to visiting the original source of that content.
Chris offers two suggestions. The first is to watch your crawl cycles in Google and wait just before to post the content. Now that is not really feasible, as Chris knows, because there is no way to exactly know when Google will crawl your site and news information must be posted as soon as possible, so waiting is normally not an option. Chris uses this example to make a point, I believe. The other option Chris suggests is to use Google Sitemaps, so Google can see you as a trusted source and be feed the information, sooner than later.
But what do you think is the algorithmic solution? I personally do not know. There are people discussing the fundamental challenge at Search Engine Roundtable Forums.
Introducing... ClickZ Live!
SES Conference & Expo has merged with ClickZ to bring you ClickZ Live! The new global conference series takes on the identity of the industry's premier digital marketing publication, ClickZ.com, and kicks off March 31-April 3 in New York City. Join the industry's leading tech-advertisers in the advertising capital of the world! Find out more ››
*Super Saver Rates expire Jan 24.