Private Searches Versus Personally Identifiable Searches

I’ve written that no private information was given by any of the major search
engines that did respond to the
Department Of
Justice subpoena
or request for search data. However, as people are
discussing and debating the case more, they’re realizing that there is some
private information contained within searches themselves. And this is true. Even
"anonymous" or "aggregate" search data has some private information.

Indeed, it can be argued that all queries are effectively private. Having
said this, there’s an important difference between private information and
private information that can be actually linked to an individual with
confidence. In this piece, I’ll explain some of the concerns and differences.

Let me start with the suggestion that ALL queries made to a search engine are
private, at least in the minds of those making them. I think it’s reasonable to
assume that most people doing searches assume they are having some type of
confidential conversation with the search engines they use. They don’t expect
that what they enter into a search box is going to be broadcast to the world.

Those more educated about search engines will know this is a false
assumption. Here’s a
list of the
many ways search engines have broadcast what people are searching for to the
world. Heck, just last month we were awash in press stories about the
top
searches of 2005
, as each major search engine released popular query lists.

Most of those lists are sanitized, so that you never see things such as porn
queries that happen. Moreover, the number of "live" displays, where you can see
in real time what people seek, has diminished over the years. However, plenty of
press accounts about Google still include the almost

mandatory description
of how visitors to Google offices around the world are
entertained by seeing "live" queries displayed on the walls (see pictures
here,

here
and an excellent one
here).

Given all this, how can I say queries are private? Because back to my main
point, most people are unaware that search queries are broadcast this
way. And because of this, they’ll reveal information to a search engine that
they may not want the rest of the world to know, private information.

How about an example. Britney Spears remains a popular search topic, so a
query like this:

britney spears

while private isn’t going to cause privacy concerns for most people, if it is
publicized. What’s could be "wrong" or "incriminating" about looking for
Britney? Heck, Yahoo’s search term suggestion
tool
for advertisers tells me that 2,230,646 searches for her happened on the Yahoo
network of advertising sites in December 2003.

Now how about this query:

britney spears nude

That’s probably embarrassing to most people who did it. It’s also the second
most popular "britney" query that happened last month, with 92,255 searches.
Despite that popularity, I’d wager that most people don’t want the world to know
they were looking for the pop star without her clothes on.

Those who did so can breathe a sigh of relief over the current fracas over
search data being released to the Department Of Justice. That’s because while
this private query has been released, no personally identifiable
information
has been released with it. There’s no way to link the query back
to the person who did it.

In particular, you can imagine that for every query at Google or another
search engine, you leave behind a record that’s something like this:

www-az3.proxy.aol.com – 25/Dec/2005 10:16:22 –
http://www.google.com/search?q=britney%20spears%20nude
740674ce213e9d9
lexluthor340@aol.com

My Search
Privacy At Google & Other Search Engines
article from 2003 explains more
about what’s in these records, but I’ll do a short breakdown here of the bolded
portions.

  1. Internet Address: The first part of that record
    (www-az3.proxy.aol.com) shows your internet address (in this case, it tells
    that someone connected through AOL). It doesn’t reveal your name, address or
    anything personal about you. However, if someone were to contact AOL and they
    themselves checked their records for that period of time — and you also had
    the Google records — then someone might link you to the query.
     
  2. Query Terms: The second part has the search words you did.
     
  3. Cookie: The third part (740674ce213e9d9) is your cookie. That tells
    Google that the request came from a particular web browser that it has
    interacted with before. Again, it doesn’t reveal your name or anything about
    you in particular. It’s just a random set of numbers assigned to identify your
    browser.
     
  4. User Account: The fourth part (lexluthor340@aol.com) represents the
    Google Account name
    you created, assuming you did create one and logged into Google to make some
    use of account-based services. It may link your email address to this query.
    If you make use of any transactional services with Google, then that might
    help link your actual physical location to the query.

That type of information in various ways is logged by ALL the major search
engines. However, NONE of the search engines that complied with the Department
Of Justice request gave out any of the personally identifiable portions. Your
internet address, cookie and user accounts were all removed, as were dates and
times. Instead of that long line above, this is all that was handed over,
according to what the search engines have said:

britney spears nude

So yes, a private query you and others made was passed along to the
government. But they could have easily seen the same by doing a search though
tools listed
here
. Heck, they could have sat in Google’s lobby while waiting to talk with
Google lawyers about the request and wrote down the queries scrolling up the
wall. But none of these methods would allow query data to be linked back to you
personally.

Ah, but what if you entered a query that does seem to have personally
identifiable data? For example, some people search for their social security
numbers or telephone numbers, to see if that information is available online.

  • Yes, that’s a private query, as all queries are likely considered
    private to searchers.
     
  • Yes, that’s private information, information probably not available
    to the general public and that you didn’t intend to have revealed to anyone.
     
  • No, that’s not personally identifiable information. Even though you
    entered your phone number or your social security number, there’s no way to
    know that you YOURSELF actually did it. In addition, there’s no way to know
    whether the information is valid at all.

In other words, anyone might enter this information. Someone could even do
this:

britney spears phone number 213-555-1212

and it doesn’t mean that Britney did the search or that the phone number is
correct.

How about a step beyond. Perhaps you know personally that someone is gay but
that person hasn’t come out to friends and family about it. You’re wondering if
anyone else might have said anything about this on the web, so you enter:

jenna bush lesbian

Now you’ve just outted one of President George W. Bush’s daughters as a
lesbian to a search engine. And since that search engine has handed over search
data to the Department Of Justice, your private information just potentially
became public in a big way.

Relax. Just because someone enters such a query doesn’t mean it’s true (and
for the record, I have no idea if Jenna Bush is or isn’t, not that it makes a
difference to me. I’m just making up an example of what could happen).

It is possible that somehow, someway some of the search data could contain
some private information that is personally identifiable. I can’t rule that out
entirely. I simply think it’s a very unlikely case. Still, it’s enough of a
possibility that it formed one reason Google objected to the request from the
Department Of Justice:

Moreover, Google’s acceding to the Request would suggest that it is willing
to reveal information about those who use its services. This is not a perception
that Google can accept. And one can envision scenarios where queries
alone could reveal identifying information about a specific Google user, which
is another outcome that Google cannot accept
.

Yes, it is possible, though it remains extremely unlikely. The better reason
not to hand over data is covered in the first sentence, that handing over
information would give the wrong impression overall to its users.

Of course, Google opens up another can of worms with its second sentence. If
there could be identifying information in queries, then why does Google give
advertisers access to keyword research tools such as
this. Entering "714"
gave me back a number of phone numbers that someone has searched for in that US
area code. As I explained above, that doesn’t tell me the phone numbers are
correct. They certainly aren’t personally identifiable. I can definitely get
phone numbers in much easier ways. But it’s private information that any Google
advertiser can access by depositing $5 to open an AdWords account.

One of the best things about the Department Of Justice action is that it’s
raising a new examination of issues like these. Should search query logs be made
less accessible to advertisers? Speaking from a marketer’s perspective, I hope
not, especially in that it really is an extremely unlikely case that any
personally identifying information would be revealed. At the very least, it may
cause people to think more carefully about what they put into search boxes in
the first place.

One last thing. Whether queries themselves have personally identifiable
information is especially becoming a hot topic in the

comments
at the MSN Search Weblog, where examples of a name followed by
things like "nazi" or "aids" have been giving. Privacy infringements? No, in the sense that we don’t know
in such examples if any of the information is true or not, as I wrote above.

Having said this, I frankly don’t know whether the government itself is smart
enough to realize this. I already have written about how dumb I think they are
in the request they’ve made. Then I read of this
from Newsweek:

What if certain search terms indicated that people were contemplating
terrorist actions or other criminal activities? Says the DOJ’s Miller, "I’m
assuming that if something raised alarms, we would hand it over to the proper
[authorities]."

So the request for data that supposedly was just being done to measure
whether children might encounter porn through search results now might be used
for other things? Kind of scary — though scary again from how dumb the
Department Of Justice again appears to be.

What are they going to hand over? That a year ago, there was a search for
something they think might be terrorist related, but that they don’t know who
did it, whether it was true or even worth the time to investigate at all?

Want to comment or discuss? Please visit our Search Engine Watch Forums
thread,
Bush
Administration Demands Search Records
.

Postscript from Gary:
I realize that this is not an apples to apples comparison since screening for terrorists on airplanes is a very very very serious issue.

With that out of the way, after reading all of the postings and news coverage about the “search subpeona” this weekend it reminded me of seveal instances reported in 2004 where major airlines handed over traveller information to the government to assist them in building and testing a passenger screening database.


These stories got some press attention and some significant outcry from privacy groups but nothing close to what we’re seeing today. Airlines including Jet Blue, Northwest, American, and Delta handed over records that in some cases contained personally identifiable data (credit card info, telephone numbers, etc.) I have links to several news stories here and a page from EPIC that summarizes the Northwest Airlines portion of the story along with related links. Again, let me stress that this is not a direct comparison to the current story but one that might be of interest to some of you.