Ah, summer. Time to play on the beach, head out on vacation and if you're a search engine, announce to the world that you've got the largest index. -- Search Engine Size Wars & Google's Supplemental Results, Search Engine Watch, Sept. 3, 2003
The quote above is from an article I wrote after Google and AllTheWeb played a game of "who's biggest" in August 2003. They'd done the same thing in August 2002. Now here we are in August 2005, and it's another spat over size once again, this time between Yahoo and Google.
I cannot believe we're going through this again. This is Search Engine Size Wars VI, by my count. It's absurd. It's annoying. It's a friggin' waste of time. Instead of advancing to a commonly accepted relevancy figure, the search engines want to keep us mired in the mud of who's biggest.
Who's biggest really doesn't matter, as I and others have written so, so, so, so, so many times before. Reasons? There are many. How about...
- You need the whole haystack! Here, if I dump it all on your head, can you find the needle now?
- If you have lots of documents but they are all near duplicates of each other, is that good?
- How much of a document have you indexed -- 101K, 500K, 1MB?
Pick your metaphor, your explanation, your qualification (Gary gives you even more here) -- we've been through this all before.
Nothing has changed. Size hasn't suddenly gotten more important overnight. What has happened is for the first time, one search engine is strongly disputing the claims of another. Google doesn't believe the figures Yahoo is bandying about, as Gary covered earlier. Yahoo has been steadfast that it's not lying.
Well let's do some testing! Let's come up with some standards! Let's audit the figures! Yeah, let's do that. After all, it's been discussed since 1999, when Northern Light wanted to say definitively that it was biggest. Surely it's time for that to happen, right?
No, it's not. If the search engines are all going to come together to figure out a standard on something, move forward! Move forward! Pull it together and unite to come up with a way to test relevancy! That's what matters, not this squawking and time wasting over size.
In Search Of The Relevancy Figure from me in 2002 looks at the need for a relevancy figure and how without it, we'll continue to have search engines use surrogates such as size for relevancy:
A relevancy figure would also free us from search engines playing the "size card" or the "freshness card" to quantify themselves as better than the competition. Yes, having a large index is generally good. Yes, having a fresh index is desirable. However, neither of these stats indicates how relevant a search engine is. Nevertheless, the search engines keep pushing them at us, and in particular at journalists, in an effort to trump their competitors.
Here we are in 2005 and what's happening? Size is pushed again in our faces. Sure, Yahoo didn't do a release on it. But it knew exactly the reaction it would get by announcing via its blog that it was twice as big as Google. And Google? The company has pulled out all the stops in lobbying us at Search Engine Watch along with other analysts to poke hard at the Yahoo numbers, because it doesn't want to be seen as "second best" in any area.
The irony is deep. Google has never provided any proof when it trumped others on the size front. MSN says it's at 5 billion in November? No problem -- Google magically announces on its home page that it's at 8.1 billion. While MSN didn't seriously question that Google was larger than it, plenty of other rumblings went around that the count might not be correct. But since it had trumped everyone else, Google apparently didn't feel the burning concern it now has that size should somehow be verified. Sure, maybe Yahoo isn't at 19 billion. But maybe Google isn't at 8 billion, either.
This game is going to go on and on until someone is brave enough to change the rules. I'm daring either of the leaders, Google or Yahoo, to do just that. Both of them say that size is one of only many factors to consider. Both of them tell you relevancy matters most. SO PROVE IT!
Ideally, I want to see the major search engines come together to develop a unified, accepted way to measure relevancy in various ways: web search, local search, advanced queries, whatever. Establish a research center, a consortium or something and a methodology that all will agree upon. Then test every four to six months and pledge you'll accept the results publicly. Someone wins? Kudos all around! Didn't win? Then do better next time.
That's the challenge. Let's see if someone steps up. As for size -- yes, Gary and I will revisit the various claims and counter-claims in more depth later this week. In the meantime, some past reading on the subject of size and the complications in measuring it:
- New Estimate Puts Web Size At 11.5 Billion Pages & Compares Search Engine Coverage has an estimate of
what search engines cover compared to self-reported claims. Despite the Ask Jeeves connection, that service doesn't come out on top in terms of size.
- Search Engine Size Wars V Erupts covers the self-reported figures and battle we had between Google and
MSN last November, along with issues such as how much of a page is actually indexed.
- Search Engine Size Wars & Google's Supplemental Results covers more on deconstructing index size
claims from 2003.
- Search Engine Sizes has more articles than you can imagine covering size issues over the years.
- How to count URLs is an archived page of what Excite used to do back in 1996 -- 1996! -- to explain how it thought counting should be done. I and others have written how in many ways, it feels like we've gone right back in a big circle of portal features being rolled out, land grabs and inflated valuations that make it fell like we're back in the 90s again. The size dispute is just another big spin of that wheel.