SEO News
Search

Bush Administration Demands Search Data; Google Says No; AOL, MSN & Yahoo Said Yes

author-default
by , Comments

NOTE: We're continuing to update this news through postscripts below the original story.

Via John Battelle and Google Morning Silicon Valley, the San Jose Mercury News article "Feds want Google search records" covers the Bush administration demanding last year that Google and other search engines turn over aggregate search information to help revive a child protection law. Google has refused to comply with the subpoena. A motion has been filed this week by US Department Of Justice to force Google to hand over the data.

In particular, the Bush administration wanted one million random web addresses and records of all Google searches for a one week period. The government apparently wants to estimate how much pornography shows up in the searches that children do.

Here's a thought. If you want to measure how much porn is showing up in searches, try searching for it yourself rather than issuing privacy alarm sounding subpoenas. It would certainly be more accurate.

Getting a list of all searches in one week definitely would let US federal government dig deep into the long tail of porn searches. But then again, the sheer amount of data would be overwhelming. Do you know every variation of a term someone might use, that you're going to dig out of the hundreds of millions of searches you'd get? Oh, and be sure you filter out all the automated queries coming in from rank checking tools, while you're add it. They won't skew the data at all, nope.

Moreover, since the data is divorced from user info, you have no idea what searches are being done by children or not. In the end, you've asked for a lot of data that's not really going to help you estimate anything at all.

Far better would be to do some searches that you think children and teens are actually doing, such as by doing a survey of them. Then just go start searching on Google and the other search engines yourselves. See what actually comes up, especially when the filtering protection each service offers is enabled. That would give you plenty of data, plus it would be useful for everyone to have someone rigorously test the filtering systems that are offered. Serving subpoenas to get the data isn't necessary.

It's important to note that from what I read, the requests do not involve user data at all. Shutting off your cookies or purging your personalized search data wouldn't protect you with this request, because the request wasn't going after personal data. To stress again:

  • According to the report, they wanted a list of one million web addresses. Not who went to the web pages and when, just a list of URLs picked randomly.
     
  • They wanted searches for one week. I haven't seen the court documents, but I'm guessing Google could have handed over a list of searches that were entirely unassociated with IP addresses, times, cookies and registration information. Nothing suggests that they wanted to know who did the searches in any way.

Having said this, such a move absolutely should breed some paranoia. They didn't ask for data this time, but next time, they might. Of course, it bears reminding that this type of data is easily obtainable from ISPs. So even if the search engines refuse to comply, your own ISP could be giving up your data -- or selling it.

Overall, I say kudos to Google for declaring the request overreaching and refusing to comply. I'm checking with the other major search engines to see if they handed over data.

I've spoken and written a bit about the idea that the search engines need to consider creating a clear "Search Privacy Bill Of Rights," spelling out clearly what protections they'll pledge you'll always have with your data and exactly how it will be used, destroyed and so on. I want to move ahead with more explorations of this -- and perhaps we need a similar one enacted by governments to spell out what they will and will not do with our highly private search data.

Moving Past Google Privacy Fears & Toward An Industry Solution from me last year gives you a lot of background on search privacy issues from over the years. There's an extensive reading list at the bottom.

After I put that out, I also created a thread at our Search Engine Watch Forums, How Should Search Engines Protect Privacy?. Unfortunately, that thread -- while it got lots of discussion -- never generated as many concrete ideas and suggestions about what should go in a Search Privacy Bill Of Rights as I hoped for. So I'm trying again. Got thoughts, comments, suggestions? Please visit our new thread, A Search Privacy Bill Of Rights.

Meanwhile, want to talk about this particular move by the Bush Administration? I have a different thread for that, Bush Administration Demands Search Records.

Postscript 1: I have queries out to AOL, Ask Jeeves, MSN and Yahoo to find out if they provided data. I'll note answers here or in a new post.

Postscript 2: I said above that a more accurate way for the government to assess how often children might encounter porn through search engines would be to conduct their own research. Indeed, they have. Government Report Says MSN Search Adult Filter Most Effective from the SEW Blog back in June covers this report (PDF format) that the US Government Accountability Office did back in June. From what I can see, it measured how often children might encounter porn through image search. To do the assessment, no subpoenas were required. From what I posted in our active Bush Administration Demands Search Records discussion at the Search Engine Watch Forums on today's news:

FYI, back to the idea of child filters on search engines, the US government has tested this, as Government Report Says MSN Search Adult Filter Most Effective covers. Note that to do this, they said:

We performed unfiltered 5-minute searches for six keywords: three keywords known to be associated with pornography and three innocuous terms that juveniles would likely use (a popular teenage singer/actress, a popular cartoon, and a popular movie character).

They managed to do this assessment (the US Government Accounting Office) without issuing a subpoena to anyone. Moreover, it has stats they say they want already produced and ready to go. Page 48 and 67 have details. The caveat is that this seems to have been a test of image search results (Yahoo was 92 percent non porn, MSN 76 percent, Google 64%). But you could do the same thing to measure web search.

Postscript 3: Here's the official Google statement from Nicole Wong, associate general counsel with Google. It's what they already told the San Jose Mercury News and are telling other publications:

Google is not a party to this lawsuit and their demand for information overreaches. We had lengthy discussions with them to try to resolve this, but were not able to and we intend to resist their motion vigorously.

Postscript 4: MSN statement is below. It doesn't really answer the question, which was if they complied with a subpoena to hand over data similar to what Google's being sued over. Since it's not a denial, I'm reading this as a tentative yes, that they got a request and passed the data along. I've asked for clarification. The statement:

MSN works closely with law enforcement officials worldwide to assist them when requested. Microsoft fully complies with the Electronic Communications Privacy Act and United States Law as well as Microsoft's terms of use and privacy policies in working with law enforcement. It is our policy to respond to legal requests in a very responsive and timely manner in full compliance with applicable law. MSN takes the safety of its customers very seriously and is committed to providing a safe experience for consumers. As stated in MSN?s Terms of Use and Subscription Agreements, Microsoft will comply with applicable law to edit, refuse to post, or to remove any information or materials, in whole or in part, in Microsoft's sole discretion.

Postscript 5: It's important to note this case is not about stopping child porn. It's about trying to get a law passed that would help the government shut down sites that allow children themselves to access porn. To prove a need for the law, the US government wants to show how much porn children might encounter through searches. It's easy to confuse these two completely different things. I did originally, corrected the first draft of my story, but I still had a section stressing the child porn angle. I've remove that from the story above. Here's what I pulled out, for those who care about such edits:

Getting a list of all searches in one week definitely would let US federal government dig deep into the long tail of porn searches. But then again, the sheer amount of data would be overwhelming. Do you know every variation of a term someone might use, that you're going to dig out of the hundreds of millions of searches you'd get? Oh, and be sure you filter out all the automated queries coming in from rank checking tools, while you're add it. They won't skew the data at all, nope.

If you do, from talking with the head of a child porn fighting group in the UK, my understanding is that many euphemisms and code words are used that won't immediately register as child porn terms.

I can assume the Bush administration probably has investigators smart enough to know the euphemisms and other terms that those after child porn might seek. If you've got that list, just go start searching on Google and the other search engines yourselves. See what actually comes up, especially when the filtering protection each service offers is enabled. That would give you plenty of data, plus it would be useful for everyone to have someone rigorously test the filtering systems that are offered.

There are plenty of other ways to get samplings of non-porn searches that are done, to measure whether porn is showing up in response to these. Serving subpoenas to get the data isn't necessary.

Postscript 6: Ask Jeeves did not provide data, as they were not asked. Statement:

Ask Jeeves has not received requests for search data from the Department of Justice in this matter.

Postscript 7: Yahoo got a request, and I'm guessing compled. Guessing? The statement is below. At first, you'd think they didn't give any information. But that's not what it says. It says they gave no "personal information." That's easy enough, since as I noted above, the government didn't request any personal information. The aggregate data they wanted wasn't personal. Therefore, Yahoo may have handed that over. I'm following up. Statement from spokesperson Mary Osako:

We are rigorous defenders of our users' privacy. We did not provide any personal information in response to the Department of Justice's subpoena. In our opinion, this is not a privacy issue.

Postscript 8: New statement came in about a minute after I posted above, making it clear Yahoo did comply:

We are rigorous defenders of our users' privacy. We did not provide any personal information in response to the Department of Justice's subpoena. In our opinion, this is not a privacy issue. We complied on a limited basis and did not provide any personally identifiable information.

Want to comment or discuss? Visit our SEW Forums thread, Bush Administration Demands Search Records.

Postscript 9: In fairness to Yahoo, which handed over information -- and MSN which likely did the same -- it is important to note that it is not just spin that no privacy issues were involved with this particular data. As I explained in the story, the information is completely divorced from any personally identifiable data.

Let me especially stress this. Want 1 million random web sites? There's no privacy issue in that. The government didn't ask for the "bad" sites or sites that were linked with any particular activity. They just wanted a list of sites, probably so they could do a survey.

It's a stupid request, of course. It's sort of like the government asking a major car dealership to give you a list of random license plate numbers rather than the Department Of Motor Vehicles. Surely the government can generate its own list without forcing a private company to do this.

How about those search requests? They are a list of searches with no user data associated with them. If that's a user privacy issue, then live displays such as listed here are a long-standing one.

Here's a better example. Infospace -- which owns the Dogpile meta search engine -- has sold raw search data to Wordtracker for years. I have never heard of anyone concerned about the privacy implications in that. This is because there aren't any. You can't see who did a search, IP addresses, cookies, etc. It's just a big long list of words.

To hammer home the point, look at this:

060119-search.gif

That's the live (and warning, unfiltered) search display from Dogpile as I wrote this postscript. See anything linking any individuals to those searches? No, and that's all the US government would have gotten, a raw list of millions of searches.

So why the hoopla? Why not give in? Two reasons:

  • Competitive: Why give even raw search data out that possibly might fall into the hands of competitors. Even then, the lists from each major search engine will be pretty similar, so not that much of a worry.
     
  • Trust: The data, as I've written, isn't going to help the government at all in what they say it will do. Heck, if they really need that list, they could buy the data from Wordtracker. But by handing it over, the search engine loses the perception of trust with its users. They may not understand that it is not personal. They will understand the government made a wideranging request for information and that the search company didn't push back. That type of trust is worth defending in the face of an ill advised, useless government action.

Postscript 10: MSN says they aren't providing more specifics beyond the statement they gave above. Since that statement does NOT deny that they provided information, I can only assume that they did. Unfair assumption? Well:

  • If they didn't get a request, as with Ask Jeeves, they'd say so (and probably breathe a sigh of relief that they didn't get one).
     
  • If they did get a request and refused to comply, I'd expect we'd have seen a court case by now, as we are with Google.

That only leaves that they got a request, and that they replied. If I'm wrong, I'll happily post a correction and new statement, if MSN provides one.

Postscript 11: Seth Finkelstein sent me a link to his Free porn, Google, spam, Internet censorship, and the Supreme Court post, which highlights something Gary and I have written about for ages. You can't trust search engine counts to prove anything. While counts themselves haven't been shown to be an issue in this case, Seth's post shows that they might be something the Department Of Justice is considering. From the Boston Globe article he points at:

Ordinarily, US Solicitor General Theodore B. Olson prepares for an appearance before the Supreme Court by acting out his argument before a pretend court. This time, for a case about the Internet, he added a new twist: searching online for free porn.

At his home last weekend, Olson told the justices yesterday, he typed in those two words in a search engine, and found that "there were 6,230,000 sites available."

The top lawyer who represents the Bush administration before the Supreme Court said the search's results illustrate how pornography on websites "is increasing enormously every day," a central point in his argument for saving an antipornography law that was enacted six years ago but has yet to go into effect.

Hmm. Six million porn sites available? OK, let me do it now on Google. Now I get a figure of 26,900,000. How porn has grown. Ah, but how many pages (the count is for pages, not web sites) do we have in all? Google doesn't report a figure. But if I search for -kfdjkkdjdkfjdkjdk9d09d09d0jdkfdkjkf, a word that doesn't exist, I get a count of 9.7 billion pages. I know that the count is much higher than this (read this to understand more), but let swing with that figure:

26.5 million / 9.7 billion = 0.27% of the web equals free porn

You want to take that figure to court to show there's a lot of porn? Please. But that figure still doesn't mean anything. A search for online porn at Google only shows you pages that have those two words on them. They could be pages writing about the evils of online porn, how to avoid online porn, why online porn should be banned. Consider this:

That's a heck of a lot of pages with "no free  porn" on them!

Fox News & Danger Of Citing Search Counts over at our Search Engine Watch Forums is another example of the fallacy of citing search counts to prove points. For more deconstructing of the Olson proof, be sure to read Seth's send-up.

Postscript 12: Court documents we've obtained so far are now up. Gary's also working very hard to summarize what's in them. See them over at his Court Documents & Summary Of United States Versus Google Over Search Data post.

Postscript 13: AOL appears to have been asked and complied, at least according to the ACLU. I'm still waiting to hear back from AOL. Via Google Blogoscoped, Feds take porn fight to Google from News.com summarizes the court documents. The ACLU challenged the law the US government seeks to revive, the Child Online Protection Act. An ACLU attorney told News.com that Microsoft, Yahoo and AOL all chose to comply.

AOL disputes what the ACLU says -- but from what I read, that dispute is the same as Yahoo's original statement that they didn't give any personal information (Postscript 7 versus Postscript 8, above). Since the government didn't ask for any personal data, of course AOL didn't hand any over. But AOL says is did hand over search queries from a roughly one day period.

Postscript 14: Xeni Jardin over at Boing Boing has confirmation that AOL, MSN and Yahoo all received requests from the Department Of Justice along with Google. Google did not comply, hence the legal action.

Postscript 15: AOL sends a statement now saying they didn't comply, though it still looks like they did in part, as I explained in Postscript 13. To say they handed over no personal data is a non-issue. The Department Of Justice demanded no personal data. It did demand a list of search terms, and AOL appears to have given some amount of these to the DOJ. The statement:

We did not -- and would not -- comply with such a subpoena. We gave the DOJ a generic list of aggregate and anonymous search terms. This did not include search results, nor any personally-identifiable information, and therefore there were absolutely no privacy implications.

Postscript 16: MSN sends a statement today (Friday, Jan. 20) saying they complied with the subpoena:

Microsoft typically does not comment on specific government inquiries. That said, as you may have heard from the DOJ they did contact us in this case. We take the privacy of our customers very seriously. We did comply with the their request for data in this case in a way that ensured we also protected the privacy of our customers. We were able to share aggregated query data (not search results) that did not include any personally identifiable information.

Postscript 17: Xeni Jardin over at Boing Boing has AOL saying they did not comply with the subpoena. It's hair splitting time on which way to go on this. As I explained in Postscript 15, the argument that AOL gave no personal data is a non-issue. No personal data was requested. They did give a list of aggregate and anonymous search terms. That's exactly what the subpoena requested. The amount they gave is uncertain. Google was asked to give search queries for all of July 2005, which was later negotiated down to a request for a week's worth of data. AOL probably gave less than originally requested but still likely a big chunk of information. No mention of whether any URLs were handed over. I still see this as complying, but I'll follow up more with AOL about it.

Postscript 18: See also The Day After: Points In The Search Trust Sweepstakes from me. It reflects back on some of the bigger issue points raised from the situation.

Want to comment or discuss? Visit our SEW Forums thread, Bush Administration Demands Search Records.


The Original Search Marketing Event is Back!
SES DenverSES Denver (Oct 16) offers an intense day of learning all the critical aspects of search engine optimization (SEO) and paid search advertising (PPC). The mission of SES remains the same as it did from the start - to help you master being found on search engines. Early Bird rates extended through Sept 19. Register today!

Recommend this story

comments powered by Disqus