New Estimate Puts Web Size At 11.5 Billion Pages & Compares Search Engine Coverage

It has been ages since I’ve seen anyone try to estimate the size of the web. Now a new paper puts it at 11.5 billion pages or more, for January 2005.
The Indexable Web is more than 11.5 billion pages has the details.

The paper from Antonio Gulli of Università di Pisa (who is also director of advanced products for Ask Jeeves) and Alessio
Signorinialso of the University of Iowa estimates what percentage of the web is covered by each search engine. I’ve shown that in the chart below. (Please note that if you saw
an earlier edition of this post, it didn’t have the figures that were sent separately from the study).


Search Engine


Self-Report. Size (Billions)




Of Indexed Web


Of Total Web










































Earlier, I said the web was estimated to have 11.5 billion or more pages by the study. The “indexed web” refers to the part of that considered to have been indexed by
search engines. That amount is estimated at 9.4 billion pages, or 82 percent of the entire web. The chart shows you what percentage of both the indexed web and the total web
each search engine covers.

OK, the first thing you wonder is whether any of the search engines are lying when they say how big they are. Google has claimed to have the biggest search index, with 8.1
billion pages.

The estimate shows that this is right on target — off by a tiny amount, so no apparent deceit by Google, at least in the sense of overstating! The same is true for MSN and Ask Jeeves.
Ask is actually estimated to have more than claimed, while MSN is right on target.

Yahoo doesn’t provide an estimate of its index. The 4.2 figure is the last we have, from back in 2004, when it said it was comparable to Google. More on this in my past
article, Search Engine Size Wars V Erupts. So the estimate we have now from this paper is nice, in that we
finally have an updated sense of where Yahoo might be.

There are a ton of caveats to throw out. The estimates are for the “visible” web, URLs that search engines can easily reach. The “invisible” or “deep” web refers to content
locked behind databases or other systems that search engines haven’t extracted. We’ve had estimates that the deep web might be 500 billion pages, in the past.

Also, while some URL normalization was done by the study, it still seems like mirror or duplicate pages may have been counted. So while there may be a certain number of
pages, the number of unique pages might be lower.

Finally, as we’ve repeatedly said, size should not be taken as a surrogate for relevancy. Having a ton of pages doesn’t mean anything if you can’t return the best pages in
the top results. It is nice to know that a search engine has good coverage of the web, but it’s only one of many factors to consider.

Still — it’s great to have some updated estimate of the web’s size, as well as search coverage. For background on size issues, see my
Search Engine Size Wars V Erupts from last November and some historic articles on the
Search Engine Sizes page. Yes, I’m still planning to update figures there! But the reference material is
all still valid, if you want to understand more on this subject.

Related reading

Simple Share Buttons