The search engine size wars may be about to heat up again, with Northern Light now claiming it has the largest index of the web.
Northern Light is staking its claim on the latest survey conducted by search expert Greg Notess, author of the Search Engine Showdown.
Notess has conducted test queries on the major search engines since October 1996. In his latest round, completed on January 5, he found Northern Light provided the most matches, followed by AltaVista and then Inktomi-powered HotBot.
In contrast, the reported sizes of the search services puts AltaVista first, at 150 million web pages, followed by Northern Light at 122 million web pages, and then Inktomi with 110 million pages.
Northern Light is keen to claim the title as biggest, as the service feels it will help increase its share of the search audience. CEO David Seuss says he believes both AltaVista and HotBot have gained significant traffic because they were seen as the biggest. If Northern Light gains the title, he expects it will also gain in popularity.
"AltaVista was perceived to be the largest, and it became quite popular as a result," Seuss said. "An incorrect study that was widely published found HotBot had 150 million pages, and it picked up traffic," he said, referring to the study by the NEC Research Institute that appeared in Science magazine last year.
Seuss disagrees with some of the findings of that NEC study, which is why he refers to it as incorrect. Likewise, although he is happy to be ranked as largest in the survey from Notess, he doesn't feel estimates are a good enough solution.
"Any estimate is open to criticism no matter how valid the methodology. So, why not simply ask each company and have the company give accurate information?" Seuss asks.
Companies already provide self-reported sizes, but Seuss wants these audited for accuracy -- or at least to have an audit of one search service's size, which could then be coupled with test queries to arrive at sizes for the others.
For example, imagine that you run a series of searches at the major search engines. You might discover that on average, AltaVista has 21 percent fewer matches than Northern Light (this is just an example -- not a fact!). If you then knew that Northern Light had exactly 122 million web pages, you could subtract 21 percent from that number to know precisely the size of AltaVista index, which would be 96 million pages. You also compare this to the self-reported number, to see if the figures were inflated.
There hasn't been a need for audited figures until now because self-reported sizes have been good enough. AltaVista and Inktomi have stood well above their competitors with indexes in excess of 100 million web pages for over a year. They've swapped the title of biggest a couple of times, but neither of them have felt that the other inflated its size enough to raise a serious complaint.
As for the other services, aside from Northern Light, their sizes have remained at 50 million web pages or fewer. Since they are clearly smaller than AltaVista or Inktomi, they've had no incentive to raise doubts about AltaVista's or Inktomi's numbers. Nor have they been inclined to squabble among themselves about who's biggest within their range. That's because they've all firmly been in the "bigger isn't necessarily better" camp. Users want quality before quantity, goes the refrain, and so they say they've focused their efforts on improving relevancy.
Northern Light's desire to be biggest causes new pressure to be placed on the reported numbers. It could indeed be the largest index but has no easy way to prove its claim. Meanwhile, database sizes are about to increase among some players, such as Infoseek.
"You always hear AltaVista and HotBot say they are the biggest, and we definitely want to go after that," said Jennifer Mullin, Infoseek's director of search. "We've just developed some different technology in house, and we've been able to invest in the scaling issues." Infoseek is currently at the 45 million page mark.
Google also aims to be a size leader. It is currently at 60 million pages, and cofounder Larry Page wants to go much higher. Page won't say exactly how high, but he gives every indication that he'd like to raise the benchmark well above the 100 million page level that currently separates the large search engines from the smaller ones.
"We want to have the most comprehensive, highest quality search that is available," Page said.
The historic size leaders of AltaVista and Inktomi also plan some increases to stay competitive, though growth isn't their top priority.
"Will we see a 200 million web page index in 1999? Probably," said Louis Monier, AltaVista's chief technical officer. "But don't expect a big jump beyond that. My goal is really to have amore useful index."
Similar sentiments come from Inktomi:
"We're not just interested in being able to advertise our self as the biggest. We want to be the best," said Troy Toman, Inktomi's director of search services.
Toman's words are echoed by all the search services. Big is nice, but relevancy remains the chief goal. I agree. I think index growth is important, and some services are overdue to increase their sizes. But while size does make a difference, it is not the only reason to choose a search service over another.
I'll be exploring the size issue more in coming months, but I think even the reported sizes give a user looking for comprehensiveness a good idea of which services to use. At the moment, AltaVista, an Inktomi-powered service like HotBot or MSN Search, or Northern Light are all excellent choices for searching across a large portion of the web -- as are meta search services.
Surveys like those conducted by Notess, NEC, Melee or the Search Engine EKGs that I produce (and expect to update soon) are also good ways to gain a better idea of comprehensiveness. To help assist in these type of surveys, there may be some standards that could be established to help outside observers measure better. In turn, these standards may lead to the type of audited results Northern Light is looking for, or at least more confidence that the self-reported numbers are indeed accurate.
Search Engine Sizes
See current reported sizes and index growth over time, based on reported numbers. You will also find links to the NEC and other size studies.
Search Engine Standards Project
Read more about some standards that, if established, might help users determine index size.
Search Engine Showdown
This site from Greg Notess provides searching tips and surveys of search engine sizes, dead links and other data.