Jean Véronis takes another look at trying to figure out Google count oddities covered before, such as why
an OR search for two different words brings back FEWER results that a search for either of the words individually (it should bring back more, since the possible matching set
In Google’s missing pages: mystery solved?, he surmises:
A possible scenario is that the real index used by Google is considerably smaller than the counts officially announced.
Indeed, it’s not a scenario. It’s reality. We know that Google has at least two indexes is uses for web searches — the "regular" one and a "supplemental" index (see
Search Engine Size Wars & Google’s Supplemental Results).
Google has never said how many pages are in that supplemental index, when exactly it gets hit and so on. As a matter of fact, I was just asking them about it last night and
still didn’t get any further information about what exactly is in there or how it is used.
Véronis does some testing and mathematical calculations to determine that the "real" index is about 5 billion pages. I’d translate that into saying that the main index is 5
billion pages — near the number that Google long used to report on its home page. And the page increase it recently announced? Seems like that was an expansion of content to
the supplemental index.
Véronis also speculates that the two indexes may be divided into pages that are actually indexed plus pages that Google knows about but hasn’t actually indexed. Perhaps,
except that Google has for years had what it calls partially-indexed URLs that aren’t actually indexed at all. As I’ve
written before, link-only listings is a better description, as these are pages Google knows about only
through link data, not from having indexed anything at all.
In either case — whether these are in a separate index or part of the main one — the idea that Google might be guessing how many link-only pages contain particular words
and then extrapolating an overall figure is interesting. And then further, if Google does some type of processor intensive search, the extrapolation might be dropped.
The other specter lurking out there is that Google might be using different algorithms for different types of searches, something that really came up during the big
"Florida" update in Nov-Dec. 2003. For Search Engine Watch
Speculation On Google Changes article looks at this in depth.
Interestingly, we’re in the midst of the most significant update since then, as my Feeling Like Google Dance
Time post explains. People are speculating on all types of things that might be trying that’s resulting in the changes — but it may very well be a number of things are in
use, depending on the type of query you are performing.
Véronis concludes with this simple advice:
In all likelihood, the Google engineers simply forgot to plug the extrapolation routine at the end of the boolean module! Therefore, if you want to know the real index
count for any word, simply type it twice.
Try it yourself, and you’ll see how the count drops. But this isn’t something new. Tara Calishain talked about it in her excellent
Google Hacks book from back in 2003 — she touches on it a bit in this
post, as well.
Going back to her book (page 22, if you’ve got it), Tara talks about how repeating a word more than once both lowers the count and also changes the order of the search.
Google gave her no explanation for this.
I’ll throw out one other thing to contemplate. I’ve been looking at getting a Tom Tom GPS systemt this week. That is, a TomTom system — but sometimes by mistake, I spell
it as two words. In these cases, it’s a good thing that I get different counts.
The results for tom are much different than for tom tom, with the
latter bringing in a lower count and getting the TomTom site I want at the top, despite my misspelling. Interestingly, do
tom tom tom, and the count drops again significantly — though I don’t see this when going from two words to three on
So yes — Google is showing oddities. Exactly what these are and why, I can’t say. I’ve asked about some of this before but gotten cagey answers, at best. I’ll go back
again to see if I have more luck, because these type of things have an impact on searchers
Also, going back to my TomTom query, the same thing happens on Yahoo in terms of a rank change, though the count drop isn’t as signficant. And you get oddities there, as
well. tom tom tom gives 62.5 million matches but
tom tom tom tom gives 64 million? At MSN, three toms gives slightly fewer matches
At least with Ask Jeeves, the count for tom versus tom tom doesn’t change — but try three or more, and you get no web search results at all. But ask upward to
ask ask ask ask, worked, and it was nice to see the number pretty much stayed rock solid.
For more background on Google counts and oddities, also be sure to see my recent post, Questioning Google’s