Last week I wrote how John Battelle followed up with Google to find out if they can link search data to IP addresses or cookies. Google said yes. I wrote that wasn't surprising. I covered back in 2003 how this is standard information any web server is likely to log, including servers at the major search engines. I also wrote last week that if Google is doing this, it was fair to assume all the major search engines are.
Rather than assume, News.com did an actual survey of this. Verbatim: Search firms surveyed on privacy has the rundown of AOL, Google, MSN and Yahoo (Ask Jeeves unfortunately was not included). Yes, they all log this information. AOL says they don't in one instance, but I'll debunk that later. First, let's go back to the bigger question of why suddenly people are asking about IP addresses and cookies.
Every time you go to a web site, you leave behind an IP address. This is like your internet telephone number, and it's possible (especially with the help of your ISP) to trace activity back to you. That 2003 article of mine, Search Privacy At Google & Other Search Engines, explains this in more detail.
Often, a web site will also assign you a cookie. This is simply a way for your browser to communicate to the web site that you've been there before (not you personally -- such as your name and address -- but you as in a particular web browser software like Internet Explorer or Firefox).
Cookies are better than IP addresses for tracking purposes, because your IP address will often change from internet surfing session to session. Your cookie stays the same, as long as you use the same browser on the same computer and don't delete it.
John's reader wanted to know if search queries at Google could be linked to an IP address or a cookie. Huh? What? Why care?
OK, let's say the government of BigBrother wants to know how many people are looking for something illegal, such as Widagra. Let's say Widagra is a drug legal in some countries but which BigBrother deems evil. If you are even remotely interested in this drug, BigBrother considers you a bad, bad person.
BigBrother wants to know all the people who might be looking for this drug via search engines, assuming that will lead them to the evildoers. So it tells the search engines to hand over a list of all IP addresses that are shown to have done a search for Widagra. The search engines hand over a list like this:
...and so on
OK, now the government of BigBrother knows all the people searching for Widagra. Well, not really. It knows a bunch of numbers, but it has to "resolve" or trace these numbers back to addresses from the various internet service providers. It does this, making the list look like this:
...and so on
Now it has to figure out which internet service providers own these addresses using network records. That works out like this:
- British Telecom
Now it has to ask each provider to tell it who was on the internet from a particular IP address at particular time. In other words, take that AOL address (cache-los-ad04.proxy.aol.com). That will be recycled among various AOL users at different times per day. In some cases, people have "static" IP addresses that don't change. But most people using the web, to my knowledge, will have different addresses assigned at different time they access the web.
So, you can get a list of all those who did a search for a particular term using IP addresses IF:
- A search engine provides the data
- An ISP also provides the record of who used an IP address at a given time
If you don't get both of these things, you don't know who did the search. And if the IP address traces back to a public computer -- one at a workplace, in a school, a library, you still don't know exactly who was on the computer.
What about cookies? They just make it easier to see that the same browser software may have done something regardless IP address. For example, say you log in using AOL on your laptop computer, then use a wireless connection when traveling. You might leave two different IP addresses, like this:
You're the same person, on the same computer, but you leave behind to completely different IP addresses. Someone just looking at IP addresses in a search engine's log records would think you are two completely different people.
Using cookies, each address would also have your browser's unique cookie identifier associated with it, like shown in bold below:
- cache-los-ad04.proxy.aol.com e43UBsS4fNZzmDgj
- host66-133-102-174.range82-123.btcentralplus.com e43UBsS4fNZzmDgj
Now even though the IP addresses are different, the cookies are the same -- so you know the same browser software made these requests.
Why's that useful? Back to BigBrother, say they scan the list of those searching for "widagra" and decide they'd like to profile individuals on that list further. They could ask to see all the searches done from a particular IP address. However, as I mentioned, since many IP addresses are reused, you aren't really seeing what one particular individual may have done.
Instead, they turn to cookies. They see that the cookied browser of "e43UBsS4fNZzmDgj" looked for "widagria," so they order up a list of all terms that browser did. They get back:
- movement to overthrow BigBrother web site
- widagra freedom campaign
- how can we stop evil widagra users
- i love president bigbrother
- email valentine's day cards
...and so on
Some of those searches might help BigBrother decide this particular person is an evildoer. But then again, maybe not. Maybe they were researching the evils of widagra. Maybe the browser software was in a library, where different people used it.
Now that the basics of IP addresses and cookies are covered, we can come back to the survey that News.com did. John's reader -- and then News.com -- asked two key questions:
- Given a list of search terms, can you produce a list of people who
searched for that term, identified by IP address and/or cookie value?
- Given an IP address or cookie value, can you produce a list of the terms searched by the user of that IP address or cookie value?
In other words:
- If someone gave you a "bad" search term, could you tell all the IP
addresses or cookies associated with that search?
- If someone gave you a particular IP address or cookie, could you build a profile of search activity associated with it?
Note that the original questions say "people," which is NOT correct. No search engine can tell you the "people" who did a search from only the IP address or cookies they have. That information does not contain someone's name, address or other personally identifying information associated with it. As I explained earlier, you'd really only be able to do that if along with search records, you also got the ISPs to give up information.
The big exception is if REGISTERED USERS are involved. By registered users, I mean that you filled out a form and then logged into My Yahoo, Gmail or some other service where you personally make yourself known to a search engine. In these cases, they now have a much better idea that a person is involved and probably who that person is.
The answer to both questions is all the major search engines interviewed log IP addresses and cookies along with search data. OK, AOL said to one of the questions that it didn't keep info:
[News.com]: Given a list of search terms, can you produce a list of people who searched for that term, identified by IP address and/or cookie value?
[AOL]: No. Our systems are not configured to track individuals or groups of users who may have searched for a specific term or terms, and we would not comply with such a request.
Despite the response, I'm 99 percent certain AOL does indeed log IP addresses and cookies along with search data. Searching on AOL creates a page request with the search terms embedded in the page's URL. That request will be logged. If it's logged, it can be analyzed. In fact, AOL later says they can give you a list of searches that were done by a particular IP address or cookied browser. If you have that information, you have the opposite.
By the way, it's worth reminding that it's not just search engines that keep IP and cookie data associated with searches. News.com almost certainly logs IP addresses when you do a search there. John's blog almost certainly does the same, when you search at his blog. We log IPs, when you search on our blog. Heck, I'd be surprised if the EFF itself didn't have standard log data recording what people are searching on there.
How long data is kept is another issue. Privacy groups feel that if data is destroyed, it can't be abused. I've written earlier that I don't really want data destroyed, since what we search for is useful historical information -- and knowing that searches were done from a particular browser or an IP address is helpful in filtering and mining data. However, as I also wrote, it could be that IP addresses and cookies get replaced in a way that they retain some unique value while rendering them completely untraceable back to an ISP.
None of that replacement happening now to my knowledge, so data is building up. How long do each of the search engines keep it?
- AOL: Personal search histories expire after 30 days, and backups
are not kept. How long log data (IP, cookied info) is maintained is not
- Google: No particular period for anything is given, which I read as
nothing being destroyed.
- MSN: Data is deleted, but not specifics are provided
- Yahoo: No particular period for anything is given, which I read as nothing being destroyed.
Overall, I don't know that much more from this survey. Google and Yahoo had already said they kept data. MSN's deleting some, but I suspect log data is backed up and kept somewhere with no destruction policy in place. Same too, for AOL.
News.com also asked if any of the companies have handed over search data? Responses:
- AOL: No comment
- Google: No comment (Gmail requests have been received)
- MSN: It has never had any criminal or civil requests for search
- Yahoo: No comment
MSN has learned a lesson from its failure to disclose properly last month in the Department Of Justice case. It was the only search engine that didn't dive for the cover of no comment and gave a clear and reassuring answer. The answer is probably the same for the other search engines, so why not just say so?
FAQ: When Google is not your friend from News.com is that publications look at what this survey means, which I came to after writing up my own thoughts on the survey. You'll see that covers issues such as IP addresses changing, how cookies are used and how a US law might -- or might not -- apply to protect search privacy.
More on search privacy issues from us, see these articles:
- Protecting Your Search Privacy: A Flowchart To Tracks You Leave Behind
- Private Searches Versus Personally Identifiable Searches
- Search Privacy At Google & Other Search Engines
- Google And The Big Brother Nomination
For more on the entire current fight between Google and the Department Of Justice, see these articles:
- Bush Administration Demands Search Data; Google Says No, Yahoo & MSN Said Yes
- Court Documents & Summary Of United States Versus Google Over Search Data
- The Day After: Points In The Search Trust Sweepstakes
- Privacy Groups, Government Officials Comment on Privacy and Web Search from Reuters
- FAQ: What does the Google subpoena mean? from News.com
- Judge Sets Hearing Date in Google Subpoena Case
- How The US Department Of Justice May Analyze Search Data & Freedom Of Information Act Request For Disclosure
Want to comment on things discussed in this article? We have several Search Engine Watch Forum threads where everyone is welcome:
Administration Demands Search Records - For general comments about the
Department Of Justice action.
Search Privacy Bill Of Rights - This is the place to comment on what types
of changes you'd like to see search engines put into place, but you can also
propose laws, as well.
On Protecting Your Search Privacy - Have I missed some great tool or
technique above on protecting search privacy? This is the place to contribute.
- News.com Surveys Search Engines On Privacy Issues: A thread covering just the News.com survey.
Twitter Canada MD Kirstine Stewart to Keynote Toronto
ClickZ Live Toronto (May 14-16) is a new event addressing the rapidly changing landscape that digital marketers face. The agenda focuses on customer engagement and attaining maximum ROI through online marketing efforts across paid, owned & earned media. Register now and save!