FAST Aims For Largest Index
From The Search Engine Report
May 4, 1999
Northern Light fired the first shot in this year's search engine size wars, saying it aimed to be biggest on the web back in January. Now the battle has been joined with the entry of Fast Search & Transfer into the web search marketplace. The Norway-based company is aiming to compete with Inktomi for providing backend search services to major portals and corporate sites, with size its key selling point.
To demonstrate its technology, FAST has launched a new site called alltheweb.com. It claims to have already indexed 80 million web pages and plans to hit 200 million by July of this year. If reached, that would be a significant milestone. No one is at that level now.
Currently, AltaVista has the largest claimed index of the web, at 150 million documents. Northern Light is close behind, at 140 million pages -- but it says that if index size were audited, it would be the clear winner. Moreover, Northern Light says that by July, it will be at 225 million pages indexed.
"We plan to keep growing the database aggressively, indefinitely. We still believe it to be the largest on the Internet today," said Marc Krellenstein, Northern Light's Director of Engineering.
For its part, Inktomi says that it's relevancy first, size second, on its list of priorities. Its index has been stable at 110 million pages for over a year.
"In terms of the database size, there is no limit to the size of the database we can put up with the Inktomi search engine," said Dennis McEvoy, Inktomi's Vice President of Development and Support. "We're really driven by our customers. We're finding that people really care about the relevance of the results they get back, so what we're doing is really spending our time looking at what makes relevance better."
Ideally, it would be nice to get both size and relevancy in the same package. That's what FAST says it will do. "We want to throw the gauntlet down," said David Burns, CEO of FAST's US operations. "You can't bring up good sites if you don't have them."
Ultimately, Burns says FAST wants to index the 500 to 600 million pages that Forrester Research estimates to exist on the web.
"Our real milestone is to get to 550. We're publicly saying we're going to build a search engine to cover the whole web." As for when this milestone is reached, Burns is less specific. "It's not measured in multiple years," he said, indicated that closing a major deal would accelerate the schedule.
One of FAST's key claims is that its system scales cheaply. It can add capacity inexpensively, using ordinary Pentium III workstations. In fact, its partnership with Dell is meant to help highlight this fact. Dell is supplying FAST with equipment at a discounted rate, in return for being featured within FAST's search implementations. There's also the possibility that the relationship will go stronger.
"We know its going to move beyond that," Burns said, referring to the existing customer-supplier relationship. Certainly Dell made a major showing of support in issuing a joint press release with FAST about the alltheweb.com site.
Of course, Inktomi also claims is that it can scale with the growth of the web, using relatively inexpensive Sun Solaris computer hardware, rather than large workstations like the Digital Alphas that power AltaVista. It's not backing away from those claims in the face of FAST's challenge.
I'm not going to drag out the facts and figures here for several reasons. I was briefed on the FAST announcement last week under embargo, which means I couldn't reveal the company's plans to others, including Inktomi, prior to this story. I was able to follow up with Inktomi on a variety of specific technical issues about its architecture, as I have done with them and other services in the past. I came away feeling that the claims and counter-claims aren't something I can cover this month. They involve some highly technical arguments, and it's somewhat premature for this as FAST isn't even yet larger than the existing indexes from Inktomi, AltaVista and Northern Light (it does currently exceed Excite, Lycos and Infoseek's claimed numbers, which are all around the 50 to 60 million web page mark).
The hardware architecture is an interesting topic, and I do hope to revisit it in the future. But I don't want to get lost in the numbers. You don't need them to take FAST seriously. The company has already implemented two specialty search services for Lycos, including the impressive MP3 search engine. At the core of its technical staff are students and professors from the Norwegian University of Science and Technology, who studied search before coming to FAST. I'm sure they can build an advanced search engine, just as the people at Inktomi have already done.
The real proof will be in the final implementation, of course. There, FAST has a way to go. Now that it's live, I spent a little time at the existing alltheweb.com site. I didn't come away overly impressed with the relevancy, and there were definitely some negatives that I saw. A search for "england" is a good example. You can clearly see duplicate pages that should be resolved somehow, and introducing clustering would help ensure that pages from one particular site don't dominate the top results, as also occurs in this example.
Having said this, these are early days. I'd expect the service to be refined over time. More importantly, alltheweb.com is not planned to be a challenger to existing portals. FAST wants to power other services, in the way Inktomi currently powers HotBot. It could even be that its partners apply their own relevancy and formatting tweaks like clustering, in the same way that HotBot uses Direct Hit to refine results initially drawn from Inktomi data.
Direct Hit also underscores the need for relevancy over size, something dealt with in the Northern Light article below. Yes, it is very important that crawler-based systems keep pace with the growth of the web. But Inktomi was pushed into the backseat at HotBot by Direct Hit, which promised better results, not a bigger index size. Given this, it's no wonder that the company is more concerned about increasing relevancy than simply gathering up pages. That's as it should be. Relevancy is what users are looking for. But if players like FAST and Northern Light can grow and return relevant results, all the better.
All The Web
Northern Light Claims Largest Index
The Search Engine Report, Feb. 2, 1999
This article deals with Northern Light's claim to having the biggest index of the web, along with a detailed examination on why the other major search engines are aiming more toward increased relevancy.