A complaint has been filed with the US Federal Trade Commission involving the alleged theft of web pages from high profile sites such as Disney, CNET and the Discovery Channel. The complaint claims that the content was "pagejacked" to generate traffic via search engines for other high profile sites such as eToys and Barnes & Noble.
Greg Boser, of California-based search engine optimization company Web Guerrilla, says he discovered the theft when searching for one of his clients by name at AltaVista. A search for that client, "Data Recovery Group," brought up a page leading to his client's competitor, "Data Recovery Labs."
Suspecting that some of his client's pages had been stolen, Boser investigated further and discovered other pages that appeared to have used without permission to benefit sites such as eToys and Barnes & Noble. These include pages that were taken from high profile sites such as Disney, CNET and the Discovery Channel, as well as personal home pages, Boser says.
The pages were managed by Green Flash, another California-based search engine optimization company. Green Flash denied knowing about the alleged theft, implying that should the claims be found true, it was due to contractors working for the company. The company has also ceased its search engine optimization services, for the time being.
"Green Flash denies that it has knowingly engaged in any illegal or unethical activities in positioning its clients web pages through major search engines. If Green Flash discovers that any of its employees or independent contractors have used unlawful or unethical means, it is our policy to terminate such individuals. Green Flash has not had been provided a copy of the FTC complaint allegedly filed by Mr. Boser of Web Guerilla and therefore has not had the opportunity to investigate the specific allegations made by him," said company CEO D.R. Peck, in a statement on the case.
Adding further, Peck said, "In recent months, it has become clear that checking the work of a group of remote, non-employee page builders was physically impossible. In order to rectify this situation, we built a new system that prohibits any page from being submitted until it is evaluated by an internal manager. However, our transition to this new system, which was designed to prevent errant subcontractors from doing sloppy work, has not yet been fully implemented. As such, we have decided to close our search engine positioning business effective immediately."
The complaint was filed with the FTC on April 27, but Web Guerrilla went public with the allegations this week, through a press release. Boser says the FTC has not yet taken any action. "I received a canned automated response a couple of days ago, but have not heard from an actual investigator," Boser said.
Boser also said that he has heard from a media representative at Barnes & Noble, which is apparently considering a response to the case. Boser was also contacted by the president of Data Recovery Labs.
"He was very apologetic. He called Green Flash and confronted them and was told they no longer are in the search engine optimization business, and they had hired a sub-contractor who was responsible. I find it a little hard to believe that over 2,000 pages got uploaded to their server without them looking at any of them," Boser said.
A key element of this case involves "cloaking." In cloaking, a search engine spider is sent a copy of a web page that's different than what a human visitor sees when viewing the same page. Search engine optimization companies employ cloaking for two main reasons. First, pages designed to rank well for search engines may not be attractive visually to humans, so cloaking keeps these ugly pages from being presented to human visitors. Second, many search engine optimization companies consider the work they do to be proprietary. They do not want their competitors to see how they've managed to achieve a top ranking.
Ordinarily, there is no way for a human visitor to see the "real" page when a form of cloaking called IP delivery is being used. However, Boser was able to take advantage of AltaVista's Babelfish translation service to pretend that he was AltaVista's search engine spider and thus view the cloaked pages.
Search engines generally do not like cloaking or even any use of "doorway pages" that are heavily engineered to achieve rankings for particular keywords, rather than to deliver actual information. However, they have been relatively sluggish in policing abuse, relying instead on making more use of "off-the-page" ranking systems to disarm the advantage that doorway pages used to provide.
This latest development will likely put more pressure on search engines to take action against cloaking, especially in that it makes it nearly impossible for someone to determine if their page content has been stolen. Ideally, search engines might provide a system like Google's popular caching feature. That allows you to see exactly the page that Google has indexed. However, it remains unclear whether Google's caching feature itself is violating copyright laws by displaying copies of other people's pages without permission.
I would hope not, as I think all search engines should allow you to view exactly what they have spidered, so that copyright issues like this current one can be avoided. Should the issue ever go to court, it is possible that by allowing a search engine to spider a web site, site owners would also be seen as giving permission for a copy of the page to be displayed.
Green Flash Pagejacking Evidence
More information about the Web Guerrilla findings, with evidence generated.
Green Flash Response
Full statement from Green Flash on the allegations.
What Is A Bridge or Doorway Page?
Has all this stuff about cloaking and IP delivery got you scratching your head? Here's an explanation of the systems involved.
A Bridge Page Too Far?
The Search Engine Report, Feb. 3, 1998
Cloaking is not new, but it is not often written about. This article was the first to raise the issue with the general public and poll for official reactions from search engines. It also involved the company named in this most recent complaint, Green Flash -- a response from the company at that time is at the end of the article.
FTC Steps In To Stop Spamming
The Search Engine Report, Oct. 3, 1999
Why involve the FTC? Because that authority has already successfully taken action against pagejackers, as this article describes.
Getting Away From Words-On-The-Page Relevancy
The Search Engine Report, March 3, 1999
Discusses how search engines are looking beyond the words on a web page, or "off-the-page," to determine relevancy.
Google Speaks Languages, WAP, Adds Other Features
The Search Engine Report, May 3, 2000
More about how to view cached pages at Google.
Auction Search Case Awaits Ruling
The Search Engine Report, May 3, 2000
Touches on the legality of indexing and the robots.txt file, which might be used to argue that site owners give permission for copies of their pages to be made by search engines.