Chris Boggs over at the Search Engine Roundtable wrote an item named Which Came First: the Content or the Plagiarism? which discusses the challenge search engines face when it comes to determining the original source of a particular piece of content.
For example, the content I am writing right now may be picked up within a matter of seconds by another site that wants to "borrow" or steal the content. So now we have two (probably a lot more than two) sources with identical content. A search engine can say, hey, I found source A before I found source B with this particular content, so source A must be the original source. But if you think about that, since spiders don't work in real time, a search engine may visit the source that "borrowed" the content prior to visiting the original source of that content.
Chris offers two suggestions. The first is to watch your crawl cycles in Google and wait just before to post the content. Now that is not really feasible, as Chris knows, because there is no way to exactly know when Google will crawl your site and news information must be posted as soon as possible, so waiting is normally not an option. Chris uses this example to make a point, I believe. The other option Chris suggests is to use Google Sitemaps, so Google can see you as a trusted source and be feed the information, sooner than later.
But what do you think is the algorithmic solution? I personally do not know. There are people discussing the fundamental challenge at Search Engine Roundtable Forums.
This Year's Premier Digital Marketing Event is #CZLSF
ClickZ Live San Francisco (Aug 11-14) will bring together the industry's leading online marketing practitioners to deliver 4 days of educational sessions and training workshops. From Data-Driven Marketing to Social, Mobile, Display, Search and Email, the comprehensive agenda will help you maximize your marketing efforts and ROI. Register today!