SEO News
Search

Search Engine Size Test: July 2000

author-default
by , Comments

Search Engine Size Test: July 2002

By Danny Sullivan
Editor, SearchEngineWatch.com
July 7, 2000

Preface

The search engine with the biggest collection of documents is not necessarily the best search engine. However, when you want to find obscure and unusual information, then turning to a crawler-based search engine with a large index is often the way to go. The use of automation, rather than human editors, means that crawlers will tend to find things that an editor might miss. Additionally, the larger the index, potentially the greater the chance you'll find some information that matches your unusual search.

Determining which search engine has the biggest index is a difficult task. All will provide you with figures they claim are true, but that doesn't mean that these self-reported numbers are correct. This is especially an issue when a search engine claims to be much larger than the others. Users may want a third party to prove these claims.

The Search Engine Showdown site (listed below) has a long standing and comprehensive size test that it runs, and it is required reading for those seeking to validate claims. The Search Engine Watch Size Test is more a supplement to that. I run it primarily when a major new size claim is made, to get my own sense of how the claim stands up.

I strongly recommend that you read some of the supplementary pages that are listed at the bottom of this article. You'll find the Search Engine Showdown site there, as well as background about the difficulty in measuring search engine size. You'll also find a page that lists the reported sizes of search engines, over time.

Reason For Current Test

The current test was conducted to see how Google's new index, which is the largest of any search engine based on reported sizes, compared against others actively competing in the search engine size wars. The competitors are AltaVista, FAST and Northern Light. Inktomi has also reentered the size competition, but its larger index has not been formally announced as available through any of its partners. Consequently, while I did some checking on Inktomi-powered search engines, I didn't do any close analysis. I also included Excite, mainly to see how its claims of having a larger index than in the past stood up against the competition. Finally, I was not able to include newcomer WebTop.com in this test, as their announcement to index 500 million pages came after I'd finished my preliminary work. I expect to do a repeat of this test in the near future, which will include them.

Obscure Terms

The first test checked on how well each search engine did in finding four obscure terms. By obscure, I mean that these were words unusual enough that no search engine found more than 100 matches. A separate page listed at the end of this article shows exactly what terms were used and the scoring methodology. The chart below summarizes the test. Search engines are listed in order of performance, with the best at the top of the list.

Search
Engine

Reported
Size

Expected
Score

Actual
Score

Rank

Google

560

1.0

1.0

1

FAST

340

2.0

1.8

2

Northern Light

265

3.0

2.3

3

HotBot

110

4.0

2.3

3

iWon

110

4.0

2.3

3

AltaVista

350

2.0

2.5

4

Yahoo-Google

560

1.0

3.0

5

Excite

250

3.0

3.0

5

Yahoo-Inktomi

110

4.0

4.3

6

The first column shows you how many millions of pages each search engine claims to have indexed. The "Expected Score" column is based on this. You would expect the search engine that has the most pages to be score first when searching for a term, as measured by how many matches it finds. In other words, search for something at Google, which claims to be the largest index, and you'd expect it to find more matches than when searching at FAST, which doesn't claim an index as large as Google's.

The "Actual Score" column shows how well each search engine actually performed. A score of 1 is perfect, while the worst possible score would have been 9, meaning the search engine consistently ranked 9th against its competitors. No one did that bad.

Overall, Google lives up to its claims of being biggest, when examining its performance in finding obscure terms. You would expect the search engine with the biggest index to consistently do well, and Google does.

While ranked second, FAST Search did better than its competitors and so also tends to prove its contention of having an extremely large index. In contrast, AltaVista's claimed index size is virtually the same as FAST's, yet AltaVista ranked fourth, rather than being tied with FAST for second, as you would expect.

Northern Light fulfils its claim to the web's third largest index. Inktomi-powered HotBot and iWon put in a surprisingly good performance, suggesting they are using the larger Inktomi database but haven't yet announced it.

Finally, notice the two Yahoo entries. One is for when Yahoo's "Web Pages" results came from Inktomi, a week ago. You can see how Yahoo did not tap into the full results set that Inktomi offers. Had it done so, Yahoo-Inktomi would have ranked up with HotBot and iWon. Similarly, Yahoo is not currently digging as deeply into Google's results set, which began serving Yahoo this week.

Unusual Terms

I also checked on finding five usual terms. By usual, I mean words that only brought back a few hundred or few thousands terms. Again, search engines are listed in order of performance, with the best at the top of the list. Excite is not included in this test, because there is no way for me to confirm counts past 1,000 matches.

Search
Engine

Reported
Size

Expected
Score

Actual
Score

Rank

FAST

340

2.0

1.6

1

Google

560

1.0

2.0

2

AltaVista

350

2.0

2.4

3

HotBot

110

4.0

3.0

4

iWon

110

4.0

3.0

4

Northern Light

265

3.0

3.4

5

Yahoo-Google

560

1.0

3.6

6

Yahoo-Inktomi

110

4.0

5.0

7

FAST Search comes up tops in this test. What happened to Google? It came in tops for every search except for -- ready for it -- "llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch." That's a town in Wales and considered to be the longest place name in Britain, Europe and is sometimes cited as the longest place name in the world. Google failed to find ANY matches for this town. Google says this is probably because they have a maximum word length, and this town's name seems to exceed it. Nevertheless, if you were searching for information about this town (fairly well known to trivia buffs), Google would have been usually of no help.

Also, once again, AltaVista fails to match FAST's performance, as you would expect when they have indexes of equal size. HotBot and iWon also put in strong showings, again suggesting they are unofficially using Inktomi's larger index.

You Can't Get Them All, Anyway!
and if you could, you wouldn't read them

Time for a pause. Searching for the unusual terms generally yielded counts of several thousand listings, and you might be excited (or maybe not) that Google found 5,583 matches for "furbish" while HotBot found only 1,700. However, you cannot retrieve more than 1,000 of those matches at either place. These search engines simply won't give you more than this, nor are they alone. Check out this message from Excite:

As a highly trafficked site, Excite has a responsibility to provide the best possible service to our customer base. Our Excite Search engine is designed to float the most relevant data from our index to the forefront of the results list returned for each query submitted. We have found that nearly 100% of users never have a need to drill down beyond the 1,000th result for a given query. For these reasons, we no longer provide more than 1,000 results per query submitted. We hope that 1,000 results more than meet your needs.

Realistically, even if you could get more than 1,000 documents, you couldn't read through them all, anyway. That is why I like to stress that index size tends to be more important when dealing with terms that generally yield matches of 100 documents or less. In those cases, you could indeed manage to review each document for relevancy, taking the relevancy burden off the search engine. But for searches that match hundreds or thousands of listings, relevancy from your search engine suddenly becomes much more important than comprehensiveness.

Naturally, you might feel more comfortable knowing that at least some search engines sort through more possibilities in order to present what they consider to be the best documents. Absolutely, this is desirable. But also remember that the web isn't a closed system. If you can get to 10 good documents about your subject, then they will probably lead to many more pages. You don't necessarily need to capture each and every possible document, when you search.

For the curious, here's a list of the total number of results you can possibly recover from the search engines tested:

Search
Engine
Max.
Results

FAST

4,010

AltaVista

1,000

Excite

1,000

Google

1,000

HotBot

1,000

iWon

1,000

Yahoo
"Web Pages"

199

Northern Light

couldn't
determine

Also see the The Million Results Myth article, at the end of this page, for another look at this issue.

Popular Terms

Just for fun, I also checked on the results for four popular terms. These are words that brought back millions of matches. You don't ordinarily care about popular terms when choosing a search engine mainly for its index size. After all, when dealing with words that yield millions of results, you are much more interested in the ability for a search engine to filter the very best pages for that topic to the top. Nevertheless, you would still expect those with the biggest indexes to rank the best in terms of total counts.

Popular

Reported
Size

Expected
Score

Actual
Score

Rank

AltaVista

350

2.0

1.3

1

FAST

340

2.0

1.8

2

Northern Light

265

3.0

2.5

3

iWon

110

4.0

3.5

4

HotBot

110

4.0

3.8

5

Google

560

1.0

4.8

6

Google placed last in this test, exactly opposite of what you would have expected. Why? Google suggests that for searches involving popular terms, it may not dig into its full index but rather only among a collection of what it considers the most popular documents. Meanwhile, AltaVista finally comes up tops. Unfortunately, its really in those earlier tests that you'd like to have seen it rank well.

Conclusion

At the moment, Google lives up to its claim of being biggest, though timing out on unusual terms poses some concerns. I would expect they'll clear up this problem within the month, once they adjust to Yahoo's new load on their servers. FAST also lives up to its claim to have the second largest index. The main disappointment is AltaVista, which doesn't seem to rank where one would expect it to be.

More Information

Search Engine Sizes
See current reported sizes of search engines, index growth over time, and links to many articles about the issue, including coverage of all the search engines mentioned here.

Who's The Biggest Of Them All?
The Search Engine Report, Nov. 1, 1999

Explains the difficulty in measuring search engine sizes.

Search Engine Size Test: Nov. 1999
Previous test to measure search engine sizes.

July 2000 Search Engine Size Test: Scoring & Notes
Explains the method behind scores shown on this page

Search Engine Showdown
http://www.searchengineshowdown.com/

This site from Greg Notess provides searching tips and surveys of search engine sizes, dead links and other data.

The Million Results Myth
About Web Search Guide, Oct. 2, 1999
http://websearch.about.com/internet/websearch/library/weekly/aa102999.htm

A longer look at how you can't get all the results a search engine finds.


ClickZ Live San Francisco This Year's Premier Digital Marketing Event is #CZLSF
ClickZ Live San Francisco (Aug 11-14) will bring together the industry's leading online marketing practitioners to deliver 4 days of educational sessions and training workshops. From Data-Driven Marketing to Social, Mobile, Display, Search and Email, the comprehensive agenda will help you maximize your marketing efforts and ROI. Register today!

Recommend this story

comments powered by Disqus