Major Security Flaw With Google Sitemaps Stats

David Naylor

points out
, as does this WebmasterWorld
thread spotted
via Threadwatch, a pretty surprising security oversight with Google’s new
Sitemaps stats
system
that can allow anyone access to stats of other web sites, if those
web sites don’t report 404/File Not Found errors correctly. Right now, I’m
looking at stats for eBay and AOL, as well as Google’s own
Orkut!

In order to see stats for a site, you have to verify you own it by installing
a special file on your server. Google randomly generates a filename to use, you
install this file, then Google checks to see if it exists. If it does, you can
view stats for that site.

The problem is, some web sites will respond that any page exists, even if it
doesn’t. Rather than sending out a 404 File Not Found error message, they’ll
dynamically generate the page with content anyway or they’ll tell the user the
file doesn’t exist, but the server code sent to a browser says differently.

For example, try this:

http://www.ebay.com/djkfjkdjfkjd

You’ll see that eBay responds that the page doesn’t exist. However, behind
the scenes it redirects the request (sending a 301 server code) to another page
that has a 200 Page Found code. As a result, along with Dave and
Barry, I’m now looking at eBay’s
stats, along with AOL’s stats.

How could we all three of us get access? Because both eBay and AOL will turn
any request into a page found code — and remember, we were all given unique
file URLs to enter. As far Google is concerned, we all have correctly installed
these files.

That’s another security issue. You’d think the system was smart enough that
if one person verified ownership, no one else could. Not so, not at the moment.

Want to ensure you are protected? Be sure you are sending out proper 404
error codes for pages that don’t exist.
Rex Swain’s HTTP Viewer is
an excellent place to check this.

When the stats system came out, I did ask Google why they didn’t go with a
more common verification system of putting special code on a page. That would
have been safer, plus easier for some people who don’t have the ability in
content management systems to easily generate files of a particular nature. I
never got a reply to that.

Another solution would be for special code to have bee installed within a
robots.txt file as a way of verifying a site with Google.

Want to discuss or comment? Visit our forum thread,

Google Loses Trust with Sitemaps
.

Postscript: It should be stressed that top query data isn’t
particularly private. Anyone with enough money can buy more extensive data
through companies like Hitwise or
comScore. The seriousness is really in
that what was supposed to be a secure verification system failed. Especially
consider Google’s words on the system:

8. What is being done to protect my privacy?

We use the verification process to keep unauthorized users from
seeing detailed statistics about your site. Only you can see these details,
and only once we verify you own the site. We don’t use the verification file
we ask you to create for any purpose other than to make sure you can upload
files to the site. You can read more about our commitment to privacy
here.

Postscript 2: Google has sent this statement:

This morning we learned of an issue with the Google Sitemaps tool that may
have temporarily enabled users to view statistics about sites they do not own.
We acted quickly and fixed the issue. To ensure the security of all sites
using the Google Sitemaps tool, we will re-verify all sites added in the last
48 hours.

Related reading

kitt
superbowl ad search volume
deceptive
connect logo