Internet Archive Sued Over Access To Pages

Author

Danny Sullivan

Date published July 13, 2005 Categories

Industry

Spotted via Threadwatch,
Keeper of Expired Web Pages Is Sued Because Archive Was Used in Another Suit from the New York Times discusses how the Internet Archive is being sued for crawling the web and making copies of web pages. A
copyright infringement case against a search engine, then? Not exactly, as we’ll see.

At issue, a court case on trademarks were evidence of past usage was found through the Internet Archive. Healthcare Advocates said copies of its pages
were made without permission. In particular, Healthcare Advocates says despite making use of a robots.txt file, there were 92 occasions when its pages still managed to be
accessed.

In a further twist, the company claims the law firm getting those pages violated the Digital Millennium Copyright Act provisions of “circumventing” the
robots.txt file exclusion.

Time for a good laugh at that, honestly. As the article explains, robots.txt is a voluntary opt-out measure designed for crawlers. It has no legal
bearing. In addition, nothing in a browser prevents someone from viewing pages that have been blocked by robots.txt. In short, no one has to circumvent robots.txt to view a
page. It doesn’t try to block that at all.

As for the copyright infringement, from what I can see, the Internet Archive itself is not being sued for copyright infringement. Instead, it’s being
sued for allowing those copies to be seen despite a robots.txt block. The article says this failure has the Internet Archive under fire for “breach of contract and fiduciary
duty, negligence and other charges.”

Interesting. I’d say absurd, but you never know, maybe the case will convince a court that a search engine has some type of binding contract with
company that runs a web site solely on the basis of crawling it. As said, robots.txt is a voluntary mechanism to keep pages out of a crawler. It’s not a legal requirement.

Moreover, while I haven’t seen the case yet (Gary will probably dig it up and post here, if so), red flags already go up about the robots.txt file
preventing “public viewing” of the pages.

Robots.txt traditionally removes pages entirely from an index. They don’t hang around. That’s certainly what the Internet Archive
says. If robots.txt was up, then at some point, the pages should have been entirely removed from the Internet Archive
period.

For some further reading, my Google & Other Search Engines: The WMDs Of Copyright
Infringement and Forget Google Print Copyright Infringement; Search Engines Already Infringe articles
cover how search engines make copies of billions of documents each month without permission, relying on the opt-out non-legal provisions of robots.txt to hopefully keep them
safe.

Postscript (from Gary): If you would like to read the actual complaint filed in the lawsuit, I’ve posted a copy (48 pages; PDF) here.

Postscript 2: Internet Archive DMCA Circumvention Lawsuit from Seth Finkelstein looks at how the robots.txt file with Internet Archive doesn’t actually remove content but rather simple suppresses display. And our forum thread, Implications of the Internet Archive lawsuit also looks at this and the important impact this can have if a domain name changes ownership. What you thought was removed might very well show up again.

More about:

Resources

Analytics The 2023 B2B Superpowers Index

The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to success. It is the indispensable guide for B2B marketers to deliver world-class experiences and keep pace with the dynamic environment. Download Now
Analytics Data Analytics in Marketing

The ClicData survey found that various challenges exist that prevent organizations from achieving such gains. These challenges included inaccessible data formats and limited flexibility in displaying data in dashboards. Download Now
Digital Marketing The Third-Party Data Deprecation Playbook

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now
Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Information

Follow us

Internet Archive Sued Over Access To Pages

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Yahoo Powering Search Results with Google

Facebook's "Yelp Alternative" Blurs Lines Between Social and Search

Here's How Badly Google Is Killing Its Digital Media Competitors (for Now)

Google to Fight Piracy by Modifying Search Algorithms

Bing Waxes Lyrical on Spam Detection and Filtering

Google Will Alert Searchers About 'Right to Be Forgotten' Link Removals

Doodle 4 Google 2014 Contest has Begun

Scroogled Rises: Microsoft Back on Attack as Google Faces New Antitrust Com...

Follow us

Internet Archive Sued Over Access To Pages

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Get the Latestdaily news and insights about search engine marketing, SEO and paid search.

Resources

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Yahoo Powering Search Results with Google

Facebook's "Yelp Alternative" Blurs Lines Between Social and Search

Here's How Badly Google Is Killing Its Digital Media Competitors (for Now)

Google to Fight Piracy by Modifying Search Algorithms

Bing Waxes Lyrical on Spam Detection and Filtering

Google Will Alert Searchers About 'Right to Be Forgotten' Link Removals

Doodle 4 Google 2014 Contest has Begun

Scroogled Rises: Microsoft Back on Attack as Google Faces New Antitrust Com...

Get the Latest
daily news and insights about search engine marketing, SEO and paid search.