Major Security Flaw With Google Sitemaps Stats

David Naylor points out, as does this WebmasterWorld thread spotted via Threadwatch, a pretty surprising security oversight with Google's new Sitemaps stats system that can allow anyone access to stats of other web sites, if those web sites don't report 404/File Not Found errors correctly. Right now, I'm looking at stats for eBay and AOL, as well as Google's own Orkut!

In order to see stats for a site, you have to verify you own it by installing a special file on your server. Google randomly generates a filename to use, you install this file, then Google checks to see if it exists. If it does, you can view stats for that site.

The problem is, some web sites will respond that any page exists, even if it doesn't. Rather than sending out a 404 File Not Found error message, they'll dynamically generate the page with content anyway or they'll tell the user the file doesn't exist, but the server code sent to a browser says differently.

For example, try this:

http://www.ebay.com/djkfjkdjfkjd

You'll see that eBay responds that the page doesn't exist. However, behind the scenes it redirects the request (sending a 301 server code) to another page that has a 200 Page Found code. As a result, along with Dave and Barry, I'm now looking at eBay's stats, along with AOL's stats.

How could we all three of us get access? Because both eBay and AOL will turn any request into a page found code -- and remember, we were all given unique file URLs to enter. As far Google is concerned, we all have correctly installed these files.

That's another security issue. You'd think the system was smart enough that if one person verified ownership, no one else could. Not so, not at the moment.

Want to ensure you are protected? Be sure you are sending out proper 404 error codes for pages that don't exist. Rex Swain's HTTP Viewer is an excellent place to check this.

When the stats system came out, I did ask Google why they didn't go with a more common verification system of putting special code on a page. That would have been safer, plus easier for some people who don't have the ability in content management systems to easily generate files of a particular nature. I never got a reply to that.

Another solution would be for special code to have bee installed within a robots.txt file as a way of verifying a site with Google.

Want to discuss or comment? Visit our forum thread, Google Loses Trust with Sitemaps.

Postscript: It should be stressed that top query data isn't particularly private. Anyone with enough money can buy more extensive data through companies like Hitwise or comScore. The seriousness is really in that what was supposed to be a secure verification system failed. Especially consider Google's words on the system:

8. What is being done to protect my privacy?

We use the verification process to keep unauthorized users from seeing detailed statistics about your site. Only you can see these details, and only once we verify you own the site. We don't use the verification file we ask you to create for any purpose other than to make sure you can upload files to the site. You can read more about our commitment to privacy here.

Postscript 2: Google has sent this statement:

This morning we learned of an issue with the Google Sitemaps tool that may have temporarily enabled users to view statistics about sites they do not own. We acted quickly and fixed the issue. To ensure the security of all sites using the Google Sitemaps tool, we will re-verify all sites added in the last 48 hours.