SEO News
Search

Inktomi Reenters Battle For Biggest

author-default
by , Comments

From The Search Engine Report
June 2, 2000

A longer and more detailed version of this article is available to Search Engine Watch "site subscribers." This is just one of the many benefits that site subscribers receive. Click here to learn more about becoming a site subscriber.

It's been over a year since one could consider Inktomi among the largest search engines on the web. The company's crawler-built index of web pages has stayed static at 110 million, while competitors such as AltaVista, FAST and Northern Light have pushed past the 200 and 300 million marks. Now Inktomi says its back. Later this month, some of its partners should begin announcing that they are making use of Inktomi's new 500 million page index. When they do, users will have access to the largest searchable database of web pages that the Internet has ever seen.

Inktomi is not adding new documents to enlarge its existing database. Instead, it has created a second database that works in cooperation with the first one. If a query doesn't appear to be satisfied by the smaller database of 110 million pages, then a check will be made of the second database that contains the additional 390 million pages.

Why do this? Inktomi claims that breaking up the index helps it maintain relevancy while also keeping the costs down. "This allows us to be good, big and cheap, all at the same time," said Matthew Hall, vice president of engineering in Inktomi's Search and Directory Division.

Inktomi has worked hard over the past year creating a system that it feels allows it to have a small index of the most popular pages on the web, as explained further in the "Numbers, Numbers" article below. It is presuming that most popular queries will continue to be best satisfied by that smaller index, making it unnecessary to do a wider search against the entire corpus of pages for every query.

"Obviously, if someone submits sports, you don't want to go to all 500 million pages," Hall said.

Going to all 500 million selectively also saves money, because it requires more hardware to match every query against the entire database. This is especially true for Inktomi, which processes nearly 50 million queries per day.

The use of multiple indexes isn't new. A search at AltaVista, for instance, pulls up results that come from a variety of sources, such as Ask Jeeves, RealNames, the Open Directory and its own web crawler results. However, Inktomi is the first major service I know of to break apart its web page index. I don't see this as bad, but Inktomi-competitor FAST argues that it may cause people to miss important documents.

"This is a subtle form of search engine censorship, as there could be gems hidden in the second part of the index that would have been listed on the first one or two page of search results, if only Inktomi had let you search them," said FAST CTO John Lervik, who says that his company's search engine always checks its entire index for every query. FAST currently has an index of 340 million pages.

The real test will be by consumers. Yes, a gem could be hidden within Inktomi's secondary database, but I see this as unlikely for popular and general queries. In contrast, when performing more obscure searches, you should be able to tap into the much wider collection of documents.

We should be able to see for ourselves later this month. Inktomi says its different partners themselves will announce when they go online with the new index. That also means that unless an Inktomi-powered service specifically says it is using a larger collection, you should assume that it still taps into only the smaller database of 110 million. In fact, some partners may not even search against the entire 110 million page index of documents. How deep to go is a decision left to Inktomi's partners.

Inktomi
http://www.inktomi.com/

FAST Search
http://www.alltheweb.com/

Numbers, Numbers -- But What Do They Mean?
The Search Engine Report, March 3, 2000
http://searchenginewatch.com/sereport/00/03-numbers.html

Explains how Inktomi hopes that its smaller index can satisfy general queries by focusing on popular documents.

Search Engine Sizes
http://searchenginewatch.com/reports/sizes.html

A look at the size of various search engines, based on reported numbers, along with links to past articles on size issues.

A longer and more detailed version of this article is available to Search Engine Watch "site subscribers." This is just one of the many benefits that site subscribers receive. Click here to learn more about becoming a site subscriber.


ClickZ Live San Francisco This Year's Premier Digital Marketing Event is #CZLSF
ClickZ Live San Francisco (Aug 11-14) will bring together the industry's leading online marketing practitioners to deliver 4 days of educational sessions and training workshops. From Data-Driven Marketing to Social, Mobile, Display, Search and Email, the comprehensive agenda will help you maximize your marketing efforts and ROI. Register today!

Recommend this story

comments powered by Disqus