Which Search Engines Log IP Addresses & Cookies — And Why Care?

Last week I wrote how John Battelle followed up with Google to find out if
they can link search data to IP addresses or cookies. Google said
yes. I
wrote that wasn’t surprising. I covered
back in 2003
how this is standard information any web server is likely to log, including
servers at the major search engines. I also wrote last week that if Google is
doing this, it was fair to assume all the major search engines are.

Rather than assume, News.com did an actual survey of this.

Verbatim: Search firms
surveyed on privacy
has the rundown of AOL, Google, MSN and Yahoo (Ask
Jeeves unfortunately was not included). Yes, they all log this information. AOL says they don’t in one instance, but I’ll debunk that
later.
First, let’s go back to the bigger question of why suddenly people are asking
about IP addresses and cookies.

Every time you go to a web site, you leave behind an IP address. This is like
your internet telephone number, and it’s possible (especially with the help of
your ISP) to trace activity back to you. That 2003 article of mine,
Search
Privacy At Google & Other Search Engines
, explains this in more detail.

Often, a web site will also assign you a cookie. This is simply a way for
your browser to communicate to the web site that you’ve been there before (not
you personally — such as your name and address — but you as in a particular
web browser software like Internet Explorer or Firefox).

Cookies are better than IP addresses for tracking purposes, because
your IP address will often change from internet surfing session to session. Your
cookie stays the same, as long as you use the same browser on the same computer
and don’t delete it.

John’s reader wanted to know if search queries at Google could be linked to an IP address or a
cookie. Huh? What? Why care?

OK, let’s say the government of BigBrother wants to
know how many people are looking for something illegal, such as Widagra.
Let’s say Widagra is a drug legal in some countries but which BigBrother deems evil.
If you are even remotely interested in this drug, BigBrother considers you a
bad, bad person.

BigBrother wants to know all the people who might be looking for this drug
via search engines, assuming that will lead them to the evildoers. So it tells
the search engines to hand over a list of all IP addresses that are shown to
have done a search for Widagra. The search engines hand over a list like this:

  • 195.93.21.100
  • 86.133.102.174
  • 144.132.1.30
    …and so on

OK, now the government of BigBrother knows all the people searching for
Widagra. Well, not really. It knows a bunch of numbers, but it has to "resolve"
or trace these numbers back to addresses from the various internet service
providers. It does this, making the list look like this:

  • cache-los-ad04.proxy.aol.com
  • host66-133-102-174.range82-123.btcentralplus.com
  • CPE-144-132-1-30.vic.bigpond.net.au
    …and so on

Now it has to figure out which internet service providers own these addresses
using network records. That works out like this:

  • AOL
  • British Telecom
  • Telstra

Now it has to ask each provider to tell it who was on the internet from a
particular IP address at particular time. In other words, take that AOL address
(cache-los-ad04.proxy.aol.com). That will be recycled among various AOL users at
different times per day. In some cases, people have "static" IP addresses that
don’t change. But most people using the web, to my knowledge, will have
different addresses assigned at different time they access the web.

So, you can get a list of all those who did a search for a particular term
using IP addresses IF:

  • A search engine provides the data
  • An ISP also provides the record of who used an IP address at a given time

If you don’t get both of these things, you don’t know who did the search. And
if the IP address traces back to a public computer — one at a workplace, in a
school, a library, you still don’t know exactly who was on the computer.

What about cookies? They just make it easier to see that the same browser
software may have done something regardless IP address. For example, say you log
in using AOL on your laptop computer, then use a wireless connection when
traveling. You might leave two different IP addresses, like this:

  • cache-los-ad04.proxy.aol.com
  • host66-133-102-174.range82-123.btcentralplus.com

You’re the same person, on the same computer, but you leave behind to
completely different IP addresses. Someone just looking at IP addresses in a
search engine’s log records would think you are two completely different people.

Using cookies, each address would also have your browser’s unique cookie
identifier associated with it, like shown in bold below:

  • cache-los-ad04.proxy.aol.com e43UBsS4fNZzmDgj
  • host66-133-102-174.range82-123.btcentralplus.com e43UBsS4fNZzmDgj

Now even though the IP addresses are different, the cookies are the same —
so you know the same browser software made these requests.

Why’s that useful? Back to BigBrother, say they scan the list of those
searching for "widagra" and decide they’d like to profile individuals on that
list further. They could ask to see all the searches done from a particular IP
address. However, as I mentioned, since many IP addresses are reused, you aren’t
really seeing what one particular individual may have done.

Instead, they turn to cookies. They see that the cookied browser of
"e43UBsS4fNZzmDgj" looked for "widagria," so they order up a list of all terms
that browser did. They get back:

  • widagra
  • movement to overthrow BigBrother web site
  • widagra freedom campaign
  • how can we stop evil widagra users
  • i love president bigbrother
  • email valentine’s day cards
    …and so on

Some of those searches might help BigBrother decide this particular person is
an evildoer. But then again, maybe not. Maybe they were researching the evils of
widagra. Maybe the browser software was in a library, where different people
used it.

Now that the basics of IP addresses and cookies are covered, we can come back
to the survey that News.com did. John’s reader — and then News.com — asked two
key questions:

  • Given a list of search terms, can you produce a list of people who
    searched for that term, identified by IP address and/or cookie value?
     
  • Given an IP address or cookie value, can you produce a list of the terms
    searched by the user of that IP address or cookie value?

In other words:

  • If someone gave you a "bad" search term, could you tell all the IP
    addresses or cookies associated with that search?
     
  • If someone gave you a particular IP address or cookie, could you build a
    profile of search activity associated with it?

Note that the original questions say "people," which is NOT correct. No
search engine can tell you the "people" who did a search from only the IP
address or cookies they have. That information does not contain someone’s name,
address or other personally identifying information associated with it. As I
explained earlier, you’d really only be able to do that if along with search
records, you also got the ISPs to give up information.

The big exception is if REGISTERED USERS are involved. By registered users, I
mean that you filled out a form and then logged into My Yahoo, Gmail or some
other service where you personally make yourself known to a search engine. In
these cases, they now have a much better idea that a person is involved and
probably who that person is.

The answer to both questions is all the major search engines interviewed log
IP addresses and cookies along with search data. OK, AOL said to one of the
questions that it didn’t keep info:

[News.com]: Given a list of search terms, can you produce a list of people
who searched for that term, identified by IP address and/or cookie value?

[AOL]: No. Our systems are not configured to track individuals or groups of
users who may have searched for a specific term or terms, and we would not
comply with such a request.

Despite the response, I’m 99 percent certain AOL does indeed log IP addresses
and cookies along with search data. Searching on AOL creates a
page request with the search terms embedded in the page’s URL. That request will
be logged. If it’s logged, it can be analyzed. In fact, AOL later says they can
give you a list of searches that were done by a particular IP address or cookied
browser. If you have that information, you have the opposite.

By the way, it’s worth reminding that it’s not just search engines that keep
IP and cookie data associated with searches. News.com almost certainly logs IP
addresses when you do a search there. John’s blog almost certainly does the
same, when you search at his blog. We log IPs, when you search on our blog.
Heck, I’d be surprised if the EFF itself
didn’t have standard log data recording what people are searching on there.

How long data is kept is another issue. Privacy groups feel that if data is
destroyed, it can’t be abused. I’ve

written earlier
that I don’t really want data destroyed, since what we
search for is useful historical information — and knowing that searches were
done from a particular browser or an IP address is helpful in filtering and
mining data. However, as I also wrote, it could be that IP addresses and cookies
get replaced in a way that they retain some unique value while rendering them
completely untraceable back to an ISP.

None of that replacement happening now to my knowledge, so data is building
up. How long do each of the search engines keep it?

  • AOL: Personal search histories expire after 30 days, and backups
    are not kept. How long log data (IP, cookied info) is maintained is not
    covered.
     
  • Google: No particular period for anything is given, which I read as
    nothing being destroyed.
     
  • MSN: Data is deleted, but not specifics are provided
     
  • Yahoo: No particular period for anything is given, which I read as
    nothing being destroyed.

Overall, I don’t know that much more from this survey. Google and Yahoo had
already said they kept data. MSN’s deleting some, but I suspect log data is
backed up and kept somewhere with no destruction policy in place. Same too, for
AOL.

News.com also asked if any of the companies have handed over search data?
Responses:

  • AOL: No comment
     
  • Google: No comment (Gmail requests have been received)
     
  • MSN: It has never had any criminal or civil requests for search
    history data
     
  • Yahoo: No comment

MSN has learned a lesson from its failure to disclose properly
last month in
the Department Of Justice case. It was the only search engine that didn’t dive
for the cover of no comment and gave a clear and reassuring answer. The answer
is probably the same for the other search engines, so why not just say so?

FAQ: When Google is
not your friend
from News.com is that publications look at what this survey
means, which I came to after writing up my own thoughts on the survey. You’ll
see that covers issues such as IP addresses changing, how cookies are used and
how a US law might — or might not — apply to protect search privacy.

More on search privacy issues from us, see these articles:

For more on the entire current fight between Google and the Department Of
Justice, see these articles:

Want to comment on things discussed in this article? We have several Search
Engine Watch Forum threads where everyone is welcome: