SEO News
Search

Inktomi Spam Database Left Open To Public

author-default
by , Comments

A longer, more detailed version of this article is
available to Search Engine Watch members.
Click here to learn more about becoming a member

Inktomi suffered an embarrassing lack of security last month, when it was discovered that a copy of its database of spam and porn sites was left open to the public.

Brett Tabke, who runs the Search Engine World web site, found the database through a search at the AllTheWeb search engine. That ironic, given that AllTheWeb is run by FAST, which competes with Inktomi for search customers.

A search for "inktomi editorial database" at AllTheWeb brought up a link leading to the Inktomi's page filtering system. Instructions on the home page of the system allowed users to easily set up guest access to explore the system or to even establish their own accounts.

Once inside, users could access Inktomi's "Hall of Spam," a list of web sites and URLs that Inktomi had banned or penalized for index abuse.

Tabke revealed the existence of the database in his quarterly newsletter last week. Not surprisingly, Inktomi removed public access to the database soon after.

"It was not supposed to be open to the public, but obviously, for a period of time, it was. That has now been resolved. It was an internal system that we used for our service," said Troy Toman, Inktomi's general manager of search.

Toman added that the database left open was a six-month old mirrored copy of the actual database. So, while people could create user accounts and alter information, that had no effect on the Inktomi database, the company says.

Despite being closed to the public, Tabke's article covered enough details about what was in the database to cause some concern among some of his readers, who posted comments at the Webmaster World discussion site that Tabke also operates.

Some of the upset aired over findings from the database is justified; but in other cases, it's hard to be outraged.

For example, there's the suggestion that Inktomi is being abusive by having a large database of spam URLs and a computerized system to track them. Tabke estimated that over 1 million URLs were on the list. So what? Spamming is a full-time, industrially-applied activity for some search engine optimization firms. It would almost be a bigger concern if Inktomi lacked some organized system to monitor and block spam. Nor would anyone in the SEO industry doubt that there is plenty of spam to catch. If anything, 1 million URLs seems low.

Inktomi said that the current system was created in early 2000, in order to comply with the Digital Millennium Copyright Act. That act requires search engines to remove links to illegal content, if they are notified. The system thus can classify a URL to be "legaldelete," meaning that it should be removed for legal reasons. However, the system was also designed to better manage spam control.

"We obviously have many ways of automatically detecting spam, but we realized that they don't capture everything, so this was for human knowledge," Toman said.

Another concern raised in the forums was the discovery that Inktomi will block IP addresses of some companies suspected of spamming them. This has the impact of removing not just one URL the company may have but all of them.

This sounds like a draconian measure, but reviewing some of the entries in the Hall of Spam leaves one with little sympathy for many of those who were caught. For instance:

  • "AddURL stuffing, hundreds an hour. These pages appear to exist solely to generate traffic for his CD sales affiliate program."
  • "Buys expired domain names. Forwards to...a porn site depending on Referrer
  • "Massive spamming operation, spamming keywords and mistyped domains of every kind."

A more serious concern was the belief that Inktomi might be banning sites in order to force people to pay for listings. However, the first 94 entries in the Hall of Spam, all that could be seen when viewing it via the web, had several references to sites that were already using paid inclusion and caught out for spamming, such as:

  • "SEO doing massive spam campaign. 162 URLs in Paid inclusion."
  • "SEO...In BoW, Gigadoc and Addurl. Has thousands of Paid Inclusion URLs"
  • "SEO spammer. dnsspam. Submitting massive urls to addurl and lately to paid inclusion."

These entries don't suggest that spamming will be tolerated by Inktomi, as long as someone is paying to be listed. Quite the contrary, they suggest that paid inclusion is no protection against Inktomi's spam penalties. Inktomi's paid inclusion guidelines also make it pretty clear that content deemed to be spam will be removed.

"We've kicked out pages where people have decided to pay us," confirmed Toman.

But what about cloaking? There were several posts concerned about Inktomi's partnership with search engine optimization firm MediaDNA, which delivers cloaked content. MediaDNA especially focuses on helping sites that have content locked behind password protected areas and which cannot be spidered by search engines. Several were upset that Inktomi tolerates cloaking from MediaDNA but not from other companies.

That sounds pretty damning, but it's not true. Cloaking is not considered spam by Inktomi. The company operates a policy of allowing one cloaked page to link to one "real" page, as long as the cloaked page reflects the content of the real page. Any company deliver relevant cloaked content to Inktomi through its paid inclusion programs. MediaDNA has no special exclusive.

In fact, a company can even deliver cloaked content without using Inktomi's paid inclusion program. As long as it is one cloaked page per real page, and as long as the content is considered relevant, then Inktomi doesn't view cloaking as spam.

"You could take a policy to say cloaking is evil in every case, but there are ways to use cloaking that are actually legitimate, to make sure a search engine gets relevant information," Toman said.

The Hall of Spam database also had a reference to another Inktomi paid inclusion partner, Position Technologies, seeming to report spam to Inktomi. This raised concerns that Inktomi might be trying to drive search engine optimization customers to particular companies, or that its partners might be using the paid inclusion system to gain a competitive advantage over other SEO companies.

"SEO spammer. Some paid inclusion....Reported to us by positiontech.com, our paid inclusion partner. Is positiontech gaming us?," one entry in the Hall of Spam read.

Tabke didn't speculate on what this entry meant in his article; he only noted it as interesting. However, within the forum, others speculated that this meant Position Tech was reporting SEO companies as spammers, apparently in an attempt to drum up its own business.

The problem with this is that Position Tech isn't a SEO firm. Instead, it creates tools that are used by SEO firms, so alienating them isn't in its interest.

For its part, Position Tech explained within the forum that the reference to "reporting" meant that the URLs came from the automated feed of paid inclusion URLs that Position Tech sends to Inktomi, not that they were reported as spam. People can buy paid inclusion in Inktomi via Position Tech, and those purchased URLs are reported to Inktomi each day. The company added that it doesn't report spam at all to Inktomi nor monitor URLs, unless they specifically make use of Position Tech optimization tools.

As for Inktomi's view, Toman said:

"At some point, there were some Position Tech entries that caused us to question this and put that in the database," Toman said. "I think it's also a good example of how we are not afraid to question content that comes in from our partners, if they have content that skirts the issue of spam."

Toman also disputed that the database was being used to drive customers to use particular partners.

"We certainly didn't say, 'We have these three people as companies, so let's go get their competitors'," Toman said. "No one got into that database without trying to spam Inktomi."

Another concern raised was that companies are being banned or penalized without knowing it. This is really only a problem for those who don't use paid inclusion. Those who do use paid inclusion are notified if their URLs are removed, Inktomi says.

As for those with free listings, it is entirely possible they might be banned without being informed. However, Inktomi is hardly unique in doing this. All the major crawlers ban sites for spamming, and none of them generally inform people when this happens nor post a list of sites that have been banned.

That always sounds frightening, but the reality is that the vast majority of site owners do nothing to get themselves banned. You have to really work hard to get a search engine to target your listings. That's why those that do get banned usually understand that they've done something wrong. They don't bother following up with the search engines about their drop in traffic because they know exactly why it has happened.

Should you think you've been banned for spam, the solution is to get in contact with the search engines. If you're innocent, you should have your pages restored. Indeed, one entry from the Inktomi's Hall of Spam reflects this:

"This is a list of spammers' CUSTOMERS. Either they paid an SEO or they encourage affiliates to spam. Let's have a heart; if they call us mystified why they're not in the index, un-blacklist them."

Inktomi said it doesn't make its spam list public because the comments haven't been cleaned up to make sense to the average person.

"If I had the resources to fully document it and provide all the instructions, I could make that public," Toman said.

I suggested that perhaps a simpler Hall of Spam should be published, with just company names. That way, SEO firms that think they've done something wrong can easily confirm this. More importantly, potential SEO customers could examine the list to see exactly who is on it.

That's especially important, because there were at least two prominent names on the list that I reviewed, which list people such as British Airways, Amazon, LL Bean and Lucent Technologies as their clients. Tabke, who managed to download a much longer list of the Hall of Spam, claimed that it read like a Who's Who of the SEO industry.

This leads into the bigger issue of what is spam. Comments in the Webmasters World forum have lead into the often-repeated desire that search engine optimization specialists would like search engines to spell out exactly what is acceptable, so that they can play by the rules.

Perhaps we'll get there, but as I've written before, the barrier to getting really detailed rules is the fear search engines have that doing so simply provides too much information about how they operate.

Of all the information I reviewed within the filter database, there was only one thing that really concerned me: a URL definition called "track-monetization." It was defined as:

"We want to know how much money we can make from a potential monetization customer before we appoach them for a potential monetization deal."

Using this classification, Inktomi will tag existing, freely listed URLs from major web sites that they feel could be paid inclusion partners, then approach them about entering the program, the company says. The pitch is then that by using paid inclusion, they could get even more listings or have their listings updated more frequently, thus getting good traffic.

Should the company not wish to participate, Inktomi says that their sites will continue to be indexed via the regular crawling mechansims.

The Inktomi Papers
Search Engine World Quarterly, Sept. 12, 2001
http://www.searchengineworld.com/newsletter/2001/

Tabke's article appeared at the URL above but has been pulled. He originally wrote it without any comments from Inktomi, not wanting to contact them before publishing, for fear they'd pull the database. Now that the database has been closed, Tabke's pulled his article, until Inktomi responses can be added.

Webmaster World Discussion Thread
http://www.webmasterworld.com/forum5/829.htm

This is where the Inktomi article has been discussed by Tabke's readers and others.

The ambiguous Inktomi anti-spam policy
Pandia, Sept. 14 2001
http://www.pandia.com/sw-2001/54-mediadna.html

Covers Tabke's article and touches on some of the ways discussed in the database about how spam is tracked down, including by monitoring SEO discussion forums.

How Inktomi Works
http://searchenginewatch.com/subscribers/inktomi.html

Explains how the Inktomi search engine operates, including details on paid inclusion programs and partners, available to Search Engine Watch members.

Pay For Placement?
http://searchenginewatch.com/resources/paid-listings.html

Extensive past coverage on paid inclusion issues can be found here.

Inktomi Content Policy- Spam Removal Guidelines
http://www.inktomi.com/products/search/content_policy.html

Inktomi posted these guidelines fairly recently, and they make clear that cloaking is OK.

FAQs about Inktomi Search/Submit
http://www.positiontech.com/inktomi/faq.htm

Item 21 explains that Inktomi allows one doorway page per "real" page and doesn't ban them from being cloaked.

A longer, more detailed version of this article is
available to Search Engine Watch members.
Click here to learn more about becoming a member


ClickZ Live San Francisco This Year's Premier Digital Marketing Event is #CZLSF
ClickZ Live San Francisco (Aug 11-14) will bring together the industry's leading online marketing practitioners to deliver 4 days of educational sessions and training workshops. From Data-Driven Marketing to Social, Mobile, Display, Search and Email, the comprehensive agenda will help you maximize your marketing efforts and ROI. Register today!

Recommend this story

comments powered by Disqus