IndustryBush Administration Demands Search Data; Google Says No; AOL, MSN & Yahoo Said Yes

Bush Administration Demands Search Data; Google Says No; AOL, MSN & Yahoo Said Yes

NOTE: We’re continuing to update this news through
postscripts below the original story.

Via John Battelle
and
Google Morning Silicon Valley
, the San Jose Mercury News article “Feds want
Google search records
” covers the Bush
administration demanding last year that Google and other search engines turn
over aggregate search information to help revive a child protection law. Google
has refused to comply with the subpoena. A motion has been filed this week by US
Department Of Justice to force Google to hand over the data.

In particular, the Bush administration wanted one million random web
addresses and records of all Google searches for a one week period. The
government apparently wants to estimate how much pornography shows up in the
searches that children do.

Here’s a thought. If you want to measure how much porn is showing up in
searches, try searching for it yourself rather than issuing privacy alarm
sounding subpoenas. It would certainly be more accurate.

Getting a list of all searches in one week definitely would let US federal
government dig deep into the long tail of porn searches. But then again, the
sheer amount of data would be overwhelming. Do you know every variation of a
term someone might use, that you’re going to dig out of the hundreds of millions
of searches you’d get? Oh, and be sure you filter out all the automated queries
coming in from rank checking tools, while you’re add it. They won’t skew the
data at all, nope.

Moreover, since the data is divorced from user info, you have no idea what
searches are being done by children or not. In the end, you’ve asked for a lot
of data that’s not really going to help you estimate anything at all.

Far better would be to do some searches that you think children and teens are
actually doing, such as by doing a survey of them. Then just go start searching
on Google and the other search engines yourselves. See what actually comes up,
especially when the filtering protection each service offers is enabled. That
would give you plenty of data, plus it would be useful for everyone to have
someone rigorously test the filtering systems that are offered. Serving
subpoenas to get the data isn’t necessary.

It’s important to note that from what I read, the requests do not involve
user data at all. Shutting off your cookies or purging your personalized search
data wouldn’t protect you with this request, because the request wasn’t going
after personal data. To stress again:

  • According to the report, they wanted a list of one million web addresses.
    Not who went to the web pages and when, just a list of URLs picked randomly.
  • They wanted searches for one week. I haven’t seen the court documents, but
    I’m guessing Google could have handed over a list of searches that were
    entirely unassociated with IP addresses, times, cookies and registration
    information. Nothing suggests that they wanted to know who did the searches in
    any way.

Having said this, such a move absolutely should breed some paranoia. They
didn’t ask for data this time, but next time, they might. Of course, it bears
reminding that this type of data is easily obtainable from ISPs. So even if the
search engines refuse to comply, your own ISP could be giving up your data — or
selling it.

Overall, I say kudos to Google for declaring the request overreaching and
refusing to comply. I’m checking with the other major search engines to see if
they handed over data.

I’ve spoken and written a bit about the idea that the search engines need to
consider creating a clear “Search Privacy Bill Of Rights,” spelling out clearly
what protections they’ll pledge you’ll always have with your data and exactly
how it will be used, destroyed and so on. I want to move ahead with more
explorations of this — and perhaps we need a similar one enacted by governments
to spell out what they will and will not do with our highly private search data.

Moving Past
Google Privacy Fears & Toward An Industry Solution
from me last year gives
you a lot of background on search privacy issues from over the years. There’s an
extensive reading list at the bottom.

After I put that out, I also created a thread at our Search Engine Watch
Forums, How
Should Search Engines Protect Privacy?
. Unfortunately, that thread — while
it got lots of discussion — never generated as many concrete ideas and
suggestions about what should go in a Search Privacy Bill Of Rights as I hoped
for. So I’m trying again. Got thoughts, comments, suggestions? Please visit our
new thread,
A Search Privacy Bill Of Rights
.

Meanwhile, want to talk about this particular move by the Bush
Administration? I have a different thread for that,
Bush
Administration Demands Search Records
.

Postscript 1: I have queries out to AOL, Ask
Jeeves, MSN and Yahoo to find out if they provided data.
I’ll note answers
here or in a new post.

Postscript 2: I said above that a more accurate
way for the government to assess how often children might encounter porn through
search engines would be to conduct their own research. Indeed, they have.

Government Report
Says MSN Search Adult Filter Most Effective
from the SEW Blog back in June
covers this report (PDF
format) that the US Government Accountability Office did back in June. From what
I can see, it measured how often children might encounter porn through image
search. To do the assessment, no subpoenas were required. From what I posted in
our active
Bush Administration Demands Search Records
discussion at the Search Engine
Watch Forums on today’s news:

FYI, back to the idea of child filters on search engines, the US government
has tested this, as

Government Report Says MSN Search Adult Filter Most Effective
covers. Note
that to do this, they said:

We performed unfiltered 5-minute searches for six keywords: three
keywords known to be associated with pornography and three innocuous terms
that juveniles would likely use (a popular teenage singer/actress, a popular
cartoon, and a popular movie character).

They managed to do this assessment (the US Government Accounting Office)
without issuing a subpoena to anyone. Moreover, it has stats they say they
want already produced and ready to go. Page 48 and 67 have details. The caveat
is that this seems to have been a test of image search results (Yahoo was 92
percent non porn, MSN 76 percent, Google 64%). But you could do the same thing
to measure web search.

Postscript 3: Here’s the official Google
statement
from Nicole Wong, associate general counsel with Google. It’s what
they already told the San Jose Mercury News and are telling other publications:

Google is not a party to this lawsuit and their demand for information
overreaches. We had lengthy discussions with them to try to resolve this, but
were not able to and we intend to resist their motion vigorously.

Postscript 4: MSN statement is below.
It doesn’t really answer the question, which was if they complied with a
subpoena to hand over data similar to what Google’s being sued over. Since it’s
not a denial, I’m reading this as a tentative yes, that they got a request and
passed the data along. I’ve asked for clarification. The statement:

MSN works closely with law enforcement officials worldwide to assist them
when requested. Microsoft fully complies with the Electronic Communications
Privacy Act and United States Law as well as Microsoft’s terms of use and
privacy policies in working with law enforcement. It is our policy to respond
to legal requests in a very responsive and timely manner in full compliance
with applicable law. MSN takes the safety of its customers very seriously and
is committed to providing a safe experience for consumers. As stated in MSN?s
Terms of Use and Subscription Agreements, Microsoft will comply with
applicable law to edit, refuse to post, or to remove any information or
materials, in whole or in part, in Microsoft’s sole discretion.

Postscript 5: It’s important to note this case
is not about stopping child porn. It’s about trying to get a law passed that
would help the government shut down sites that allow children themselves to
access porn.
To prove a need for the law, the US government wants to show
how much porn children might encounter through searches. It’s easy to confuse
these two completely different things. I did originally, corrected the first
draft of my story, but I still had a section stressing the child porn angle.
I’ve remove that from the story above. Here’s what I pulled out, for those who
care about such edits:

Getting a list of all searches in one week definitely would let US federal
government dig deep into the long tail of porn searches. But then again, the
sheer amount of data would be overwhelming. Do you know every variation of a
term someone might use, that you’re going to dig out of the hundreds of millions
of searches you’d get? Oh, and be sure you filter out all the automated queries
coming in from rank checking tools, while you’re add it. They won’t skew the
data at all, nope.

If you do, from talking with the head of a child porn fighting group in the
UK, my understanding is that many euphemisms and code words are used that won’t
immediately register as child porn terms.

I can assume the Bush administration probably has investigators smart enough
to know the euphemisms and other terms that those after child porn might seek.
If you’ve got that list, just go start searching on Google and the other search
engines yourselves. See what actually comes up, especially when the filtering
protection each service offers is enabled. That would give you plenty of data,
plus it would be useful for everyone to have someone rigorously test the
filtering systems that are offered.

There are plenty of other ways to get samplings of non-porn searches that are
done, to measure whether porn is showing up in response to these. Serving
subpoenas to get the data isn’t necessary.

Postscript 6: Ask Jeeves did not provide
data,
as they were not asked. Statement:

Ask Jeeves has not received requests for search data from the Department of
Justice in this matter.

Postscript 7: Yahoo got a request, and
I’m guessing compled.
Guessing? The statement is below. At first, you’d
think they didn’t give any information. But that’s not what it says. It says
they gave no “personal information.” That’s easy enough, since as I noted above,
the government didn’t request any personal information. The aggregate data they
wanted wasn’t personal. Therefore, Yahoo may have handed that over. I’m
following up. Statement from spokesperson Mary Osako:

We are rigorous defenders of our users’ privacy. We did not provide any
personal information in response to the Department of Justice’s subpoena. In our
opinion, this is not a privacy issue.

Postscript 8: New statement came in about a
minute after I posted above, making it clear Yahoo did comply:

We are rigorous defenders of our users’ privacy. We did not provide any
personal information in response to the Department of Justice’s subpoena. In our
opinion, this is not a privacy issue. We complied on a limited basis and did not
provide any personally identifiable information.

Want to comment or discuss? Visit our SEW Forums thread,
Bush
Administration Demands Search Records
.

Postscript 9: In fairness to Yahoo, which handed over information —
and MSN which likely did the same — it is important to note that it is not just
spin that no privacy issues were involved with this particular data. As I
explained in the story, the information is completely divorced from any
personally identifiable data.

Let me especially stress this. Want 1 million random web sites? There’s no
privacy issue in that. The government didn’t ask for the “bad” sites or sites
that were linked with any particular activity. They just wanted a list of sites,
probably so they could do a survey.

It’s a stupid request, of course. It’s sort of like the government asking a major car
dealership to give you a list of random license plate numbers rather than the Department Of Motor Vehicles. Surely the
government can generate its own list without forcing a private company to do
this.

How about those search requests? They are a list of searches with no user
data associated with them. If that’s a user privacy issue, then live displays
such as listed
here
are a long-standing one.

Here’s a better example. Infospace — which owns the Dogpile meta search
engine — has sold raw search data to
Wordtracker
for years. I have never heard of anyone concerned about the
privacy implications in that. This is because there aren’t any. You can’t see
who did a search, IP addresses, cookies, etc. It’s just a big long list of
words.

To hammer home the point, look at this:

060119-search.gif

That’s the live (and warning, unfiltered)

search display
from Dogpile as I wrote this postscript. See anything linking
any individuals to those searches? No, and that’s all the US government would
have gotten, a raw list of millions of searches.

So why the hoopla? Why not give in? Two reasons:

  • Competitive: Why give even raw search data out that possibly might
    fall into the hands of competitors. Even then, the lists from each major
    search engine will be pretty similar, so not that much of a worry.
  • Trust: The data, as I’ve written, isn’t going to help the
    government at all in what they say it will do. Heck, if they really need that
    list, they could buy the data from Wordtracker. But by handing it over, the
    search engine loses the perception of trust with its users. They may not
    understand that it is not personal. They will understand the government made a
    wideranging request for information and that the search company didn’t push
    back. That type of trust is worth defending in the face of an ill advised,
    useless government action.

Postscript 10: MSN says they aren’t
providing more specifics beyond the statement they gave above. Since that
statement does NOT deny that they provided information, I can only assume that
they did. Unfair assumption? Well:

  • If they didn’t get a request, as with Ask Jeeves, they’d say so (and
    probably breathe a sigh of relief that they didn’t get one).
  • If they did get a request and refused to comply, I’d expect we’d have seen
    a court case by now, as we are with Google.

That only leaves that they got a request, and that they replied. If I’m
wrong, I’ll happily post a correction and new statement, if MSN provides one.

Postscript 11: Seth Finkelstein sent me a
link to his
Free porn, Google, spam, Internet censorship, and the Supreme Court
post,
which highlights something Gary and I have written about for ages. You can’t
trust search engine counts to prove anything. While counts themselves haven’t
been shown to be an issue in this case, Seth’s post shows that they might be
something the Department Of Justice is considering. From the Boston Globe

article
he points at:

Ordinarily, US Solicitor General Theodore B. Olson prepares for an
appearance before the Supreme Court by acting out his argument before a
pretend court. This time, for a case about the Internet, he added a new twist:
searching online for free porn.

At his home last weekend, Olson told the justices yesterday, he typed in
those two words in a search engine, and found that “there were 6,230,000 sites
available.”

The top lawyer who represents the Bush administration before the Supreme
Court said the search’s results illustrate how pornography on websites “is
increasing enormously every day,” a central point in his argument for saving
an antipornography law that was enacted six years ago but has yet to go into
effect.

Hmm. Six million porn sites available? OK, let me
do it
now
on Google. Now I get a figure of 26,900,000. How porn has grown. Ah, but
how many pages (the count is for pages, not web sites) do we have in all? Google
doesn’t report a figure. But if I search for

-kfdjkkdjdkfjdkjdk9d09d09d0jdkfdkjkf
, a word that doesn’t exist, I get a
count of 9.7 billion pages. I know that the count is much higher than this (read
this to understand more), but let swing with that figure:

26.5 million / 9.7 billion = 0.27% of the web equals free porn

You want to take that figure to court to show there’s a lot of porn? Please.
But that figure still doesn’t mean anything. A search for online porn at Google
only shows you pages that have those two words on them. They could be pages
writing about the evils of online porn, how to avoid online porn, why online
porn should be banned. Consider this:

That’s a heck of a lot of pages with “no free  porn” on them!


Fox News & Danger Of Citing Search Counts
over at our Search Engine Watch
Forums is another example of the fallacy of citing search counts to prove
points. For more deconstructing of the Olson proof, be sure to read
Seth’s send-up.

Postscript 12: Court documents we’ve
obtained so far are now up. Gary’s also working very hard to summarize what’s in
them. See them over at his
Court Documents &
Summary Of United States Versus Google Over Search Data
post.

Postscript 13: AOL appears to have been asked
and complied, at least according to the ACLU. I’m still waiting to hear back
from AOL.

Via
Google Blogoscoped,
Feds take porn fight to
Google
from News.com summarizes the court documents. The ACLU challenged the
law the US government seeks to revive, the
Child Online
Protection Act
. An ACLU attorney told News.com that Microsoft, Yahoo and AOL
all chose to comply.

AOL disputes what the ACLU says — but from what I read, that dispute is the
same as Yahoo’s original statement that they didn’t give any personal
information (Postscript 7 versus Postscript 8, above). Since the government
didn’t ask for any personal data, of course AOL didn’t hand any over. But AOL
says is did hand over search queries from a roughly one day period.

Postscript 14: Xeni Jardin over at Boing Boing
has
confirmation
that AOL, MSN and Yahoo all received requests from the
Department Of Justice along with Google. Google did not comply, hence the legal
action.

Postscript 15: AOL sends a statement now
saying they didn’t comply, though it still looks like they did in part, as I
explained in Postscript 13.
To say they handed over no personal data is a
non-issue. The Department Of Justice demanded no personal data. It did demand a
list of search terms, and AOL appears to have given some amount of these to the
DOJ. The statement:

We did not — and would not — comply with such a subpoena. We gave the DOJ a
generic list of aggregate and anonymous search terms. This did not include
search results, nor any personally-identifiable information, and therefore there
were absolutely no privacy implications.

Postscript 16: MSN sends a statement today
(Friday, Jan. 20) saying they complied with the subpoena:

Microsoft typically does not comment on specific government inquiries. That
said, as you may have heard from the DOJ they did contact us in this case. We
take the privacy of our customers very seriously. We did comply with the their
request for data in this case in a way that ensured we also protected the
privacy of our customers. We were able to share aggregated query data (not
search results) that did not include any personally identifiable information.

Postscript 17: Xeni Jardin over at Boing
Boing
has AOL saying
they did not comply with the subpoena. It’s hair splitting
time on which way to go on this. As I explained in Postscript 15,
the argument that AOL gave no personal data is a non-issue. No personal data was
requested. They did give a list of aggregate and anonymous search terms. That’s
exactly what the subpoena requested. The amount they gave is uncertain. Google
was asked to give search queries for all of July 2005, which was later
negotiated down to a request for a week’s worth of data. AOL probably gave less
than originally requested but still likely a big chunk of information. No
mention of whether any URLs were handed over. I still see this as complying, but
I’ll follow up more with AOL about it.

Postscript 18: See also The
Day After: Points In The Search Trust Sweepstakes
from me. It reflects back
on some of the bigger issue points raised from the situation.

Want to comment or discuss? Visit our SEW Forums thread,
Bush
Administration Demands Search Records
.

Resources

The 2023 B2B Superpowers Index

whitepaper | Analytics The 2023 B2B Superpowers Index

8m
Data Analytics in Marketing

whitepaper | Analytics Data Analytics in Marketing

10m
The Third-Party Data Deprecation Playbook

whitepaper | Digital Marketing The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

whitepaper | Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

1y