News Robot Leads To Linking,
From The Search Engine Report
Jan. 9, 1998
The robots.txt file is a widely recognized standard used to stop search engines from crawling a web site. However, the convention may have a more far-reaching influence than some people realize, as illustrated in a recent dispute between the Sunday Times and search engine News Index.
News Index crawls a variety of news web sites. It is similar to Excite's NewsTracker and Wired Digital's NewsBot. These services provide exceptionally good results for current event searching, because they crawl only news sites once or twice a day. Thus, the results are usually focused and timely.
The Sunday Times took exception over visits from News Index for a variety of reasons, the biggest being that by listing its news stories, it was easy for visitors to bypass registration.
"One of our main objections about News Index was that links from it bypassed our registration process. Each registration number is intended for the use of one individual; the way News Index was linking to our stories undermined this," said Dominic Young, copyright manager for Sunday Times-publisher News International.
This is similar to an ongoing complaint between Microsoft and Ticketmaster, where Ticketmaster objects to Microsoft linking to pages within the site. This bypasses the Ticketmaster "front door" and results in lost advertising, Ticketmaster says.
Some experts fear that if Ticketmaster wins, permission would be required for linking. However, Ticketmaster weakens its case because it does not object to search engines crawling its site. It has no robots.txt file banning them. That means search engines index inside pages, and plenty of visitors enter the site via these backdoors.
The Sunday Times situation is different. It had a robots.txt file up that forbade indexing its news pages, but the News Index spider ignored this. Thus, the Sunday Times could argue it has acted consistently to prevent internal linking, which might convince a court to enforce a judgement.
Beyond the linking issue, the Sunday Times was also concerned that indexing its site violates its copyright.
"The indexing of a substantial proportion of our site, even in the form of abstracts, is an infringement of copyright and not permitted by English fair dealing rules," Young said.
Assuming the dispute went to court -- and was upheld both in the UK and in other countries where search engines are based, it could mean that any search engine would be violating copyright when it spiders sites. That could bring search engines, as we know them, to a halt.
One search engine product manager felt that because the pages a search engine indexes could not be reproduced, the copyright argument probably would not be an issue, at least in the US.
"I could see it if we were reproducing their pages or not taking them out to the particular site," said Excite's Kris Carpenter. To date, no one has complained, she said.
No one is complaining, because almost invariably, web site owners want their sites listed. Those that don't can implement a robots.txt file. Assuming the search engine operator recognizes the convention, and most do, no legal action is necessary. Beyond this, most responsible operators will stop indexing, if requested.
That's the situation that's occurred with News Index. Founder Sean Peck says News Index ignores the robots.txt file because when the service launched in early 1996, there were problems complying with the wishes of news sites that wanted to be indexed but which also had robots.txt files up to stop other crawlers.
"We opted to go with the direct contact approach, that being if a site contacts us and asks us not to include them we will honor that request," Peck said, explaining that he held off on complying with the first letter he received from the Sunday Times only so that he could talk with them and understand their exact concerns.
By now, both parties have reached an accommodation. Neither was interested in a legal battle over the situation. But there's no reason it couldn't occur in the future between other parties.
"This could definitely come up again," Peck said.
If so, the presence of the little robots.txt file could be crucial in a decision.
Launched in April 1996, indexes news stories from hundreds of sources, worldwide.
Linking a copyright violation?
News.com, Dec. 11, 1997
Some interesting quotes from legal experts, along with details on other news-related linking disputes.
Ticketmaster, Microsoft and Search Engines
The Search Engine Report, June 6, 1997
Details about Ticketmaster's lack of a robots.txt file, and the implications this might have on its dispute over internal linking with Microsoft.
NOTE: The Ticketmaster dispute has since been settled out of court.
MSoft, Ticketmaster Bury Hatchet
Wired News, Feb 16, 1999
Early Bird Rates have been extended!
June 12-14, 2013: Join industry experts at SES Toronto for a crash course in the latest strategies in Online Marketing and Advertising.
Save $300 when you register by Thursday, May 23.