More On Google & Blocking Privacy Proxies

Yesterday I
wrote
about how several proxy servers used by those wishing to search and
surf anonymously had apparently been blocked by Google, including the popular
Tor service. Google’s since explained why
these were blocked and how human users can get around the barrier.

Google told me that someone or something was using the Tor system to hit them
with an extremely large number of queries, which caused the block on the network
to come online.

Couldn’t Google have done this in a way to filter out the humans but block
the spiders? Cory Doctorow, who wrote the
Boing
Boing post
on the subject, especially felt Google was being too heavy
handed. In an email exchange we had, he wrote me:

Google has a lot of engineering talent, but it approached this problem with
a fireax, not a scalpel.

Actually, Google is using both a fireax and a scalpel. It’s just that some
Tor users might not see the scalpel, if they have cookies disabled, from what I
can tell.

A human user, with a browser that accepts cookies, would get a slightly
different block page. This one would allowing them to prove they weren’t a
spider via a CAPTCHA code.

In other words, look at this
image from The Chunk,
which
sparked
yesterday’s Boing Boing post. Now look at the image of a very
similar page that you’ll see

here
.

Notice how the second example has a part that says:

 To continue searching, please type the characters you see below

After this is a code, a CAPTCHA, a system to filter out robots that can’t
read the text in the image.

Anyone set to accept cookies will see the CAPTCHA challenge, be able to fill
it out and continue searching. But isn’t accepting cookies defeating the purpose
of using a system like Tor designed to keep you anonymous?

Not necessarily. For example, in Firefox, you could choose to have cookies
cleared every time you close the browser. That means for your searching session,
Google will only know that someone from an anonymous IP (it can’t be traced back
to you, remember) did a series of searches for a particular session of time.

Close your browser, come back to Google, and you’d get a new cookie (along
with an entirely new IP address). There would be no way to associate your
searches over a long period of time, which potentially could lead to how one
person was identified in the recent
AOL data release
case
— assuming somehow, someway, someone got to all of Google’s data over
time.

It’s unlikely — though still possible — that you could do enough searching
within one session to give yourself away just based on your queries. For those
still concerned about this, I suppose you could do a search, then clear your
cookie and search again. Alternatively, don’t search for anything that you think
could potentially reveal who you are.

For more on protecting your search privacy, see my past posts
Which Search
Engines Log IP Addresses & Cookies — And Why Care?
and
Protecting Your
Search Privacy: A Flowchart To Tracks You Leave Behind
.

Could Google do things better? Absolutely. Since many people using services
like Tor might not be allowing cookies, Google should change the page that comes
up for "robots" to say something like "if you’re a human, please allow cookies,
and then you’ll get a code to let you in." Google could even take the further
step of detailing how to set up cookies and clear them in popular browsers to
better guide those concerned about privacy. And to be fair, all the search
engines could do more on that front.

That page can definitely be more helpful in other ways. When I’ve heard of
this happening in the past,
it was typically because someone from a particular ISP or shared IP address was
doing a lot of rank checking. That might cause the entire IP range to get
closed.

Unfortunately, Google’s current warning page doesn’t give the unfortunate
innocent users much guidance that things outside their control might be to
blame. Instead, it sends them thinking that maybe they’ve got a virus or spyware.
I can see that has

caused
at least one person to waste time checking how to "fix" a problem
they didn’t have.

It would also be nice to see more help pages on Google about this in general.
All these things are ideas Google said it will consider.

Postscript: Cory emailed me this:

Danny, I believe that they could solve this problem without requiring cookies — for example, they could embed a RESTful, expriring GUID in the URL-line on the successful solution of a CAPTCHA:

http://www.google.com/search?q=boing&CAPTCHA=KJASJFSE

Related reading

email chart
gopro
south-park
citizen kane
Simple Share Buttons