SEO News
author-default

New Estimate Puts Web Size At 11.5 Billion Pages & Compares Search Engine Coverage

by , Comments

It has been ages since I've seen anyone try to estimate the size of the web. Now a new paper puts it at 11.5 billion pages or more, for January 2005. The Indexable Web is more than 11.5 billion pages has the details.

The paper from Antonio Gulli of Università di Pisa (who is also director of advanced products for Ask Jeeves) and Alessio Signorinialso of the University of Iowa estimates what percentage of the web is covered by each search engine. I've shown that in the chart below. (Please note that if you saw an earlier edition of this post, it didn't have the figures that were sent separately from the study).

Search Engine

Self-Report. Size (Billions)

Est.
Size
(Billions)

Coverage
Of Indexed Web

Coverage
Of Total Web

Google

8.1

8.0

76.2%

69.6%

Yahoo

4.2
(est)

6.6

69.3%

57.4%

Ask

2.5

5.3

57.6%

46.1%

MSN
(beta)

5.0

5.1

61.9%

44.3%

Indexed
Web

9.4

   

Total
Web

11.5

Earlier, I said the web was estimated to have 11.5 billion or more pages by the study. The "indexed web" refers to the part of that considered to have been indexed by search engines. That amount is estimated at 9.4 billion pages, or 82 percent of the entire web. The chart shows you what percentage of both the indexed web and the total web each search engine covers.

OK, the first thing you wonder is whether any of the search engines are lying when they say how big they are. Google has claimed to have the biggest search index, with 8.1 billion pages.

The estimate shows that this is right on target -- off by a tiny amount, so no apparent deceit by Google, at least in the sense of overstating! The same is true for MSN and Ask Jeeves. Ask is actually estimated to have more than claimed, while MSN is right on target.

Yahoo doesn't provide an estimate of its index. The 4.2 figure is the last we have, from back in 2004, when it said it was comparable to Google. More on this in my past article, Search Engine Size Wars V Erupts. So the estimate we have now from this paper is nice, in that we finally have an updated sense of where Yahoo might be.

There are a ton of caveats to throw out. The estimates are for the "visible" web, URLs that search engines can easily reach. The "invisible" or "deep" web refers to content locked behind databases or other systems that search engines haven't extracted. We've had estimates that the deep web might be 500 billion pages, in the past.

Also, while some URL normalization was done by the study, it still seems like mirror or duplicate pages may have been counted. So while there may be a certain number of pages, the number of unique pages might be lower.

Finally, as we've repeatedly said, size should not be taken as a surrogate for relevancy. Having a ton of pages doesn't mean anything if you can't return the best pages in the top results. It is nice to know that a search engine has good coverage of the web, but it's only one of many factors to consider.

Still -- it's great to have some updated estimate of the web's size, as well as search coverage. For background on size issues, see my Search Engine Size Wars V Erupts from last November and some historic articles on the Search Engine Sizes page. Yes, I'm still planning to update figures there! But the reference material is all still valid, if you want to understand more on this subject.


ClickZ Live Toronto Twitter Canada MD Kirstine Stewart to Keynote Toronto
ClickZ Live Toronto (May 14-16) is a new event addressing the rapidly changing landscape that digital marketers face. The agenda focuses on customer engagement and attaining maximum ROI through online marketing efforts across paid, owned & earned media. Register now and save!*
*Early Bird Rates expire April 17.

Recommend this story

comments powered by Disqus