I've written that no private information was given by any of the major search engines that did respond to the Department Of Justice subpoena or request for search data. However, as people are discussing and debating the case more, they're realizing that there is some private information contained within searches themselves. And this is true. Even "anonymous" or "aggregate" search data has some private information.
Indeed, it can be argued that all queries are effectively private. Having said this, there's an important difference between private information and private information that can be actually linked to an individual with confidence. In this piece, I'll explain some of the concerns and differences.
Let me start with the suggestion that ALL queries made to a search engine are private, at least in the minds of those making them. I think it's reasonable to assume that most people doing searches assume they are having some type of confidential conversation with the search engines they use. They don't expect that what they enter into a search box is going to be broadcast to the world.
Those more educated about search engines will know this is a false assumption. Here's a list of the many ways search engines have broadcast what people are searching for to the world. Heck, just last month we were awash in press stories about the top searches of 2005, as each major search engine released popular query lists.
Most of those lists are sanitized, so that you never see things such as porn queries that happen. Moreover, the number of "live" displays, where you can see in real time what people seek, has diminished over the years. However, plenty of press accounts about Google still include the almost mandatory description of how visitors to Google offices around the world are entertained by seeing "live" queries displayed on the walls (see pictures here, here and an excellent one here).
Given all this, how can I say queries are private? Because back to my main point, most people are unaware that search queries are broadcast this way. And because of this, they'll reveal information to a search engine that they may not want the rest of the world to know, private information.
How about an example. Britney Spears remains a popular search topic, so a query like this:
while private isn't going to cause privacy concerns for most people, if it is publicized. What's could be "wrong" or "incriminating" about looking for Britney? Heck, Yahoo's search term suggestion tool for advertisers tells me that 2,230,646 searches for her happened on the Yahoo network of advertising sites in December 2003.
Now how about this query:
britney spears nude
That's probably embarrassing to most people who did it. It's also the second most popular "britney" query that happened last month, with 92,255 searches. Despite that popularity, I'd wager that most people don't want the world to know they were looking for the pop star without her clothes on.
Those who did so can breathe a sigh of relief over the current fracas over search data being released to the Department Of Justice. That's because while this private query has been released, no personally identifiable information has been released with it. There's no way to link the query back to the person who did it.
In particular, you can imagine that for every query at Google or another search engine, you leave behind a record that's something like this:
www-az3.proxy.aol.com - 25/Dec/2005 10:16:22 - http://www.google.com/search?q=britney%20spears%20nude - 740674ce213e9d9 - firstname.lastname@example.org
My Search Privacy At Google & Other Search Engines article from 2003 explains more about what's in these records, but I'll do a short breakdown here of the bolded portions.
- Internet Address: The first part of that record
(www-az3.proxy.aol.com) shows your internet address (in this case, it tells
that someone connected through AOL). It doesn't reveal your name, address or
anything personal about you. However, if someone were to contact AOL and they
themselves checked their records for that period of time -- and you also had
the Google records -- then someone might link you to the query.
- Query Terms: The second part has the search words you did.
- Cookie: The third part (740674ce213e9d9) is your cookie. That tells
Google that the request came from a particular web browser that it has
interacted with before. Again, it doesn't reveal your name or anything about
you in particular. It's just a random set of numbers assigned to identify your
- User Account: The fourth part (email@example.com) represents the Google Account name you created, assuming you did create one and logged into Google to make some use of account-based services. It may link your email address to this query. If you make use of any transactional services with Google, then that might help link your actual physical location to the query.
That type of information in various ways is logged by ALL the major search engines. However, NONE of the search engines that complied with the Department Of Justice request gave out any of the personally identifiable portions. Your internet address, cookie and user accounts were all removed, as were dates and times. Instead of that long line above, this is all that was handed over, according to what the search engines have said:
britney spears nude
So yes, a private query you and others made was passed along to the government. But they could have easily seen the same by doing a search though tools listed here. Heck, they could have sat in Google's lobby while waiting to talk with Google lawyers about the request and wrote down the queries scrolling up the wall. But none of these methods would allow query data to be linked back to you personally.
Ah, but what if you entered a query that does seem to have personally identifiable data? For example, some people search for their social security numbers or telephone numbers, to see if that information is available online.
- Yes, that's a private query, as all queries are likely considered
private to searchers.
- Yes, that's private information, information probably not available
to the general public and that you didn't intend to have revealed to anyone.
- No, that's not personally identifiable information. Even though you entered your phone number or your social security number, there's no way to know that you YOURSELF actually did it. In addition, there's no way to know whether the information is valid at all.
In other words, anyone might enter this information. Someone could even do this:
britney spears phone number 213-555-1212
and it doesn't mean that Britney did the search or that the phone number is correct.
How about a step beyond. Perhaps you know personally that someone is gay but that person hasn't come out to friends and family about it. You're wondering if anyone else might have said anything about this on the web, so you enter:
jenna bush lesbian
Now you've just outted one of President George W. Bush's daughters as a lesbian to a search engine. And since that search engine has handed over search data to the Department Of Justice, your private information just potentially became public in a big way.
Relax. Just because someone enters such a query doesn't mean it's true (and for the record, I have no idea if Jenna Bush is or isn't, not that it makes a difference to me. I'm just making up an example of what could happen).
It is possible that somehow, someway some of the search data could contain some private information that is personally identifiable. I can't rule that out entirely. I simply think it's a very unlikely case. Still, it's enough of a possibility that it formed one reason Google objected to the request from the Department Of Justice:
Moreover, Google's acceding to the Request would suggest that it is willing to reveal information about those who use its services. This is not a perception that Google can accept. And one can envision scenarios where queries alone could reveal identifying information about a specific Google user, which is another outcome that Google cannot accept.
Yes, it is possible, though it remains extremely unlikely. The better reason not to hand over data is covered in the first sentence, that handing over information would give the wrong impression overall to its users.
Of course, Google opens up another can of worms with its second sentence. If there could be identifying information in queries, then why does Google give advertisers access to keyword research tools such as this. Entering "714" gave me back a number of phone numbers that someone has searched for in that US area code. As I explained above, that doesn't tell me the phone numbers are correct. They certainly aren't personally identifiable. I can definitely get phone numbers in much easier ways. But it's private information that any Google advertiser can access by depositing $5 to open an AdWords account.
One of the best things about the Department Of Justice action is that it's raising a new examination of issues like these. Should search query logs be made less accessible to advertisers? Speaking from a marketer's perspective, I hope not, especially in that it really is an extremely unlikely case that any personally identifying information would be revealed. At the very least, it may cause people to think more carefully about what they put into search boxes in the first place.
One last thing. Whether queries themselves have personally identifiable information is especially becoming a hot topic in the comments at the MSN Search Weblog, where examples of a name followed by things like "nazi" or "aids" have been giving. Privacy infringements? No, in the sense that we don't know in such examples if any of the information is true or not, as I wrote above.
Having said this, I frankly don't know whether the government itself is smart enough to realize this. I already have written about how dumb I think they are in the request they've made. Then I read of this from Newsweek:
What if certain search terms indicated that people were contemplating terrorist actions or other criminal activities? Says the DOJ's Miller, "I'm assuming that if something raised alarms, we would hand it over to the proper [authorities]."
So the request for data that supposedly was just being done to measure whether children might encounter porn through search results now might be used for other things? Kind of scary -- though scary again from how dumb the Department Of Justice again appears to be.
What are they going to hand over? That a year ago, there was a search for something they think might be terrorist related, but that they don't know who did it, whether it was true or even worth the time to investigate at all?
Want to comment or discuss? Please visit our Search Engine Watch Forums thread, Bush Administration Demands Search Records.Postscript from Gary: I realize that this is not an apples to apples comparison since screening for terrorists on airplanes is a very very very serious issue.
With that out of the way, after reading all of the postings and news coverage about the "search subpeona" this weekend it reminded me of seveal instances reported in 2004 where major airlines handed over traveller information to the government to assist them in building and testing a passenger screening database.
These stories got some press attention and some significant outcry from privacy groups but nothing close to what we're seeing today. Airlines including Jet Blue, Northwest, American, and Delta handed over records that in some cases contained personally identifiable data (credit card info, telephone numbers, etc.) I have links to several news stories here and a page from EPIC that summarizes the Northwest Airlines portion of the story along with related links. Again, let me stress that this is not a direct comparison to the current story but one that might be of interest to some of you.