Why Don't External Site Popularity Estimates Add Up?

A twofer today on whether you can trust the web metrics that are reported out there, one an article from BusinessWeek while the other is a big study from SEOmoz based on data gathered from a variety of search blogs. More details below, with lots of comments from me along the way.

Web Numbers: What's Real? from BusinessWeek looks at how sites want to prove they're popular but their own internal metrics might not stand-up to external ones -- nor do external services themselves agree.

I love the irony here. I wrote about how in August, BusinessWeek itself declared Digg to be the 24th most popular site in the US based on Alexa data that many marketers are highly suspicious of. Now I've got BusinessWeek telling me:

The dirty little secret of Silicon Valley is that no one knows exactly who is going where on the Web.

Pity that secret wasn't outed before a BusinessWeek cover story leveraging on of those stats so highly. In fact, BusinessWeek now writes:

Web outfits seem to agree that Alexa is flawed, but they continue to rely on it because the data are so addictive. Since Alexa's numbers are free and available online, they can easily be plugged into a PowerPoint presentation or onto a blog, providing a quick-and-dirty way to get a competitive snapshot. Blogs cite Alexa as gospel, and its graphs are part of nearly every startup's pitch to investors.

Apparently, the stats were just gospel for bloggers. BusinessWeek took it as gospel itself.

Meanwhile, the story leaves me cold when it says:

No wonder that a host of newer services, such as Alexa and Hitwise, are highlighting the weaknesses of the older traffic-measuring companies and are muscling onto the scene with alternatives.

Yes, Alexa's only been offering site traffic estimates since at least 1999, so let's call it a newer service. Sorry for the rant, BusinessWeek, but you're not redeeming yourself well with this.

Still, it's a nice update to what's actually an old, old problem, that internal metrics might not agree with external estimates. The search engines long ago would yap that comScore or NetRatings said they weren't as popular as internally they believed. Naturally, they stay quiet if those figures perhaps are off in their favor. My past series on stats look at this more:

Another good point in the article is how it highlights that things like AJAX and widgets might not get counted in traffic figures. Counting popularity on the web has never been easy, and it's just getting more complicated.

Meanwhile, over at SEOmoz, Rand Fishkin's finished a project where he's assembled internal metrics from various search-related blogs and compares them to some external metrics. Website Analytics vs. Competitive Intelligence Metrics is well worth checking out, if only to see how different sites stack up against each other, based on self-reported figures.

Unfortunately, the big visitor table isn't sorted by any particular order. It's mainly showing sites with the most visits in 2006, but there are a few glitches that throw it off. Still, lots of stats to love there.

There's another table that lists metrics from Alexa, Compete, Technorati and other sources for each of the sites. This is even harder to digest. The table seems sorted in order of who was popular based on the internal metrics. It would have been better to sort it by one of the external metrics (say Alexa) and then let you see the rank order compared to the internal metrics.

Lots of slack to Rand, however -- he had his hands full just getting this assembled and still needs time to get his Digg submission crew going to gain some page views for it. Look, Rand's blog gets most of its traffic from Digg -- he's a master. Yep, but hey Rand -- who has the highest percentage of traffic from search engines? That would be my Daggle blog -- eat my dust, Rand! Then again, that might also suggest an lack of other online marketing activities for Daggle -- and that would be right. It's just my play area :)

Back to the internal versus external comparison. With the tables hard to digest, I went straight to the summary:

From our estimates, the top 5 best predictors of traffic, in order, are:

  1. Technorati Rank
  2. Yahoo! Link Count
  3. Technorati Link Count
  4. SEOmoz's Page Strength Score
  5. Alexa Rank

However, none of these are nearly accurate enough to use, even in combination, to help predict a site's level of traffic or its relative popularity, even in a small niche with similar competitors. Unfortunately, it appears that the external metrics available for competitive intelligence on the web today simply do not provide a significant source of value....

Incidentally, I did log in to Hitwise to check their estimations and although I can't publish them (as Hitwise is a paid service and doing so would violate terms of service), I can say that the numbers issued from the competitive intelligence tool were no better than Alexa's in predicting relative popularity or traffic estimation.

The sad conclusion is that right now, no publicly available competitive analysis tool we're aware of provides solid value. Let's hope the next few years provide better data. Please leave comments, questions or feedback in this blog post on the topic.

Go get your Digg traffic for this, Rand -- it's well deserved.