Google Regains Its Hijacked Listing; This Was A Big Deal, Folks!

Two days after it appeared, Google has finally managed to get its hijacked listing restored for queries on adsense and google adsense. Two days! And this to correct a problem it has been told about for over year, a problem it largely dismissed as not being a big issue?

It looks bad, coming days after the recent song-and-dance at the Google Factory Tour about how much energy is supposedly expended on core search and ads. Here's a personalized home page, but don't worry, we're not a portal, Google said.

Funny, this type of inattention is exactly what made people get turned off from the portals of the past, when they lost focus on search quality. Yahoo seems to have fixed this redirect hijacking problem, but Google is still struggling with it?

The JenSense blog has a comment from the person's whose page usurped Google, saying he never meant to do it. His page with a meta redirect to the Google AdSense site was part of a general system of linking to sites that may change addresses.

It's actually pretty reasonable. I still have lots of links leading to GoTo.com, direct links I can only fix if I go through and change them one by one. Instead, this person says he always links to pages in his site that use meta refresh to push you elsewhere. So to link to GoTo, he'd link to his GoTo page. Then if the address of GoTo changed (as it did to Overture.com), then he only needs to change one page.

There are more elegant ways to do this, of course. In fact, many directories will run all links through redirect scripts, for easier management or tracking reasons.

In fact, this was the crux of a Threadwatch discussion back in March: Millions of Pages Google Hijacked via Open Directory feed, which in turn led to a Slashdot thread about it. That in turn brought Google out in the form of GoogleGuy to strongly dispute the claim.

To be fair, GoogleGuy was correct in part, as I emailed Nick Wilson from Threadwatch as we discussed the story:

The fact that this site's taken the ODP and 302 redirected doesn't mean that all or even millions of those sites listed in the ODP have been hijacked. Heck, you and I both wouldn't still show up in Google, if that were the case. Everything I've seen so far seems to indicate this really may only happen to sites with a smaller PR than the page pointing at them, and even then not necessarily in the case.

But that didn't mean it was a non-issue, not at all, as I also wrote:

Having said all that, I've no doubt hundreds, maybe thousands of pages might have been grabbed in this way. And as cornwall says, hard to understand why this site thinks a 302 is necessary other than to hijack this way. There's no doubt some people are being harmed, it is an issue, and it's amazing that Google still isn't solving it.

That same day, I went back to Google about getting an official line on the situation, but we never got a discussion set-up. I've done the same this week, but with the holiday weekend coming up in the US, folks who know are gone. The key question I have hanging out remains: if Yahoo solved the situation, why is Google taking so long?

It's also well worth revisiting the hijack situation raised in March to see the spin Google's put on it, and how that's come back to knock them upside the head. GoogleGuy posted this on the Slashdot thread (I've insert a link to more material):

Here's the skinny on "302 hijacking" from my point of view, and why you pretty much only hear about it on search engine optimizer sites and webmaster forums. When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation. PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.

A lot of sites that try to spam search engine indices get caught, and their PageRank goes lower and lower as their reputation suffers. We do a very good job of picking canonical urls for normal sites; sites with their PageRank going toward zero are more likely to have a different canonical url picked, though, and to a webmaster I understand that it can look like "hijacking" even though the base cause is usually your reputation declining. For a long time, it was hard to get anyone to report canonicalization problems, because the site that got "hijacked" would be free-cheap-texas-holdem-plus-viagra-and-payday-loa ns-as-well.com type sites. In fact, I had to offer to ignore the spamminess of any reported sites in order to get people to send in any real data.

But even though I suspected that this issue affected very few sites, we still wanted to collect feedback to see how big of a problem it was, and to see if we could improve our url canonicalization. So starting a while ago, we offered a way to report "302 hijacking" to Google; I mentioned the method on several webmaster forums. You contact user support and use the keyword "canonicalpage" in your report. Then I created a little mailing list with some engineers on it, and user support passes on emails that meet the criteria to the mailing list.

So how much reports has all this work (including posting multiple times on lots of webmaster boards to request data) gotten me? The last time I checked, it was under 30. Not a million pages. Not even a hundred reports. Under 30. Don't get me wrong, we're still looking at how we can do better: one engineer proposed a way that might help these sites, and he's got a testset of sites that would be affected by changes in how we canonicalized urls. A few of us have been looking through it to see if we can improve things, but please know that this is not a wildfire issue that will result in the web melting down.

In short, it's not that big of a deal. It impacts relatively few sites. And I agree, that seemed to be the case to me when I started hearing about this last year. It was coming up on SEO forums, but I wasn't hearing it from my own readership in any significant way. It really didn't seem to impact a ton of people.

Nevertheless, it was a problem. And as the situation with Google has now demonstrated, it wasn't a problem that only hits spammy sites or webmasters you might think "deserved" it anyway. It hit Google.

Say it again. It hit Google. Google got its own listing hijacked. I thought I'd seen huge irony in March when WordPress spammed Google after pledging right on the Google Blog to help fight spam or when Google banned one if its own pages for cloaking. But this takes the cake. Google's redirect bug bites Google itself.

That's search quality? That's a class product? That's a laser-like focus on search, to be aware of a problem for over a year, then let it run and run and run until it hits your own site? And then take two days to solve it? And the fix almost certainly isn't one that's been applied across the index as a whole? No, this was a major, huge embarrassing failure.

For more on this recent hijacking issue, see our past post, Google's Own Listing Gets Hijacked. To understand the situation more, see Page Hijack: The 302 Exploit, Redirects and Google from Claus Schmidt, who deserves major kudos as someone who sat down to explain the situation to everyone earlier this year, not to mention his posts on forums before that.

Want to discuss? Please drop by our Google AdSense Page Highjacked forum thread.