Beyond The Hype:
Dissecting AltaVista's Claims
From The Search Engine Report
Nov. 1, 1999
In conjunction with its relaunch, AltaVista made several claims about its new search capabilities that don't currently hold up. Here's a look at where the service isn't yet matching promises made in a recent press release:
"Four years since introducing the world's first search engine, AltaVista once again breaks new ground with a powerful way for users to zero-in on precise results by type of information."
The world's first search engine? Not by a long-shot. AltaVista hit the stage in December 1995. Both Lycos and WebCrawler had been indexing pages since early 1994. Even Excite and Infoseek were live before AltaVista.
"AltaVista Search debuted the world's largest index today, spanning an unprecedented 90 percent of Web sites, 250 million unique pages and 25 million multimedia objects."
Wow -- AltaVista spans 90 percent of sites on the web! That sounds great, but the number is essentially meaningless, as far as I can tell.
The NEC Research Institute estimated that as of February 1999, there were 2.8 million publicly accessible web servers. AltaVista tells me it compared this number with the number of web servers it has visited to come up with its 90 percent figure.
Now here's the problem. The NEC estimate was for web servers, not web sites. A single web server can host multiple web sites -- even hundreds of web sites. That means there's no way of knowing exactly how many web sites exist based on the NEC data. Similarly, there's no way for AltaVista to know exactly how many web sites it has spanned.
Let's take another look at this. Having an index of 250 million web pages is indeed an achievement -- it means AltaVista shares the top spot for largest index with FAST Search (as least based on self-reported numbers).
But that same NEC study estimated that there were 800 million pages on the web, as of February 1999. That means AltaVista's index covers 31% of these estimated web pages. So even if AltaVista really does span 90 percent of web sites, it clearly doesn't have every page from those sites, which again devalues the 90 percent number.
"AltaVista's new 'Living Index' crawls the Web 'intelligently,' recording how often pages are updated, to ensure AltaVista Search always has the Web's freshest information."
Perhaps it does. Perhaps it will. But the index certainly doesn't have the freshest information at the moment. For instance, go to the advanced search page, then look for pages containing "golf" between Oct 1 and Oct. 31. Only three pages are found -- three pages! (and one of those is already a dead link). Is it possible that only three pages containing "golf" were either added to the web or updated during all of October? Not likely -- do the same date restricted search at Northern Light, and you get over 48,000 hits.
Here's another look at the problem. Do a search for "the," and you'll find these numbers:
Aug. 1999: 5,3377,558 pages
Sept. 1999: 7,993 pages
Oct. 1999: 208 pages
If anything, AltaVista appears to have a relatively dated index, at the moment -- certainly not the web's "freshest" information, as claimed.
For its part, AltaVista says that the current index is based on a crawl done at the beginning of October, and it is just now beginning to go forward with adding new information and updating older pages.
"I think in terms of where we are going to be, we're going to have the freshest information there is. It's perhaps not as fresh as we would like it to be, at the moment, but what we are doing will completely rectify the situation," said Tracy Roberts, AltaVista's marketing director.
"AltaVista search is able to make its Freshness Guarantee: no search site will have fresher results than AltaVista."
AltaVista unveiled its first "Freshness Guarantee" back when it relaunched in June, promising that its entire index would be refreshed at least once per month. That guarantee was almost immediately broken, as even AltaVista President Rod Schrock admitted when we talked recently. "We turned our attention to this new system," Schrock said.
OK, fair enough -- they wanted to build something even better. But this new guarantee has already been broken, as described above. If claims like these are going to be made, then they should actually be met. And not to meet them in the midst of a huge media blitz is an incredible blunder.
"Called 'AltaVista Page Knowledge,' this technology analyzes the content of the page, its meta-tags, the page's connectivity, any referring anchor text and other pertinent information. No other Web search engine takes all these variables into account when measuring relevancy."
The implication is that AltaVista is going well beyond what other search engines consider when determining relevancy. In reality, both Inktomi and Go analyze all of the factors specifically named above, according to past interviews with them. Of course, the exact algorithm used by each service is unique -- but AltaVista certainly has equals as to the factors it is specifically naming.
AltaVista Search Press Release
Search Engine Coverage Study Published
The Search Engine Report, August 2, 1999
More information about that NEC study.