A number of the leading online news publishers are looking to organize greater control over how and what news of theirs gets listed in the search results of the various search engines, according to a report by the Associated Press.
“Currently, Google Inc., Yahoo Inc. and other top search companies voluntarily respect a Web site’s wishes as declared in a text file known as “robots.txt,” which a search engine’s indexing software, called a crawler, knows to look for on a site,” AP noted.
Though the individual engines have other proprietorial code and the publishers want to have a greater influence on how this is developed and would like to see a unified methodology, the article reported.
“The current system doesn’t give sites “enough flexibility to express our terms and conditions on access and use of content,” said Angela Mills Wade, executive director of the European Publishers Council, one of the organizations behind the proposal. “That is not surprising. It was invented in the 1990s and things move on,” Wade told AP.
Robots.txt files were first developed in 1994 and have been the standard method webmasters use to block spiders (the crawlers search engines use to go through websites’ content). However, there has been much conversation online over the past 5-6 years that some crawlers ignore the robots.txt file.
The publishers desire for “proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP”, AP noted.