I’ve got some follow-up items about yesterday’s story where
AOL released user
query records, including how anyone can now easily look at the data.
First, after Barry did a recap of the news, I added a postscript to the story
with more of my thoughts. In case you missed it, here are the key parts below:
from John Battelle and
AOL apologizes for release of user search data from News.com have AOL
apologizing for the release, now said to be data involving about 658,000
individuals from March through May of this year. AOL says the release of the
data wasn’t properly vetted for privacy issues and that the release intentions
I believe that. Make no mistake, this was a big screw up. The researchers
providing the data didn’t think hard enough about how making it possible to
build a profile of individuals, even if they were given anonymous names, might
then make it possible to determine who those people are if they revealed enough
information in their searches.
In addition, it’s going to be very difficult for some law enforcement agency
not to want to subpoena AOL for actual user names when they read about things
that suggest a murder is being planned or may have happened, as covered above.
I’m not saying they’ll get it, but I think it’s almost inevitable that someone
will try. That will set off further privacy fireworks.
But yes, the original intention was innocent. I got an email about the
research site last week (and with my traveling all last week, simply did not
have a chance to check it out). Here’s what a researcher involved with it
Over the last few years I have witnessed a divide developing within
Information Retrieval research – between the haves and have-nots. The ‘haves’
are the companies like Google, Yahoo, MSN, and ourselves, with lots of
resources and data. The ‘have-nots’ are people without those resources such as
academic researchers and smart guys at small companies. We want to be able to
help anyone work on great ideas by giving them the data and infrastructure
So we started building data sets and made them available for everyone to
test their ideas with. Each data set features a dynamic view, which allows you
to inspect the data without having to download it. We also built some APIs for
news, video, audio and podcasts, which will save people time from having to do
that themselves. We have tried to stay away from interfaces like web search as
those are already around.
There’s nothing evil in that. In fact, there’s much to appreciate,
We all use search engines so much, and they are so important in our daily
lives, yet they remain one of the most poorly researched media venues out there.
Yes, we’re getting new labs like
the one from
Yahoo at UC Berkeley. But most search behavior studies outside of the search
depended on ancient search logs from places like Excite from back in 2001 or
so. Newer studies, if the search engines are doing them, simply don’t come out
often. So the intention to promote learning with this release was innocent, if
not honorable. The execution was poor and inexcusable.
This is the second major milestone in raising awareness of search privacy
issues this year. The first was the
Justice action, which rightly focused on whether we need more safeguards
over what governments can request. Today’s upset highlights the protections that
are needed again corporate releases of data.
The good news is that perhaps it will spur better protections even more.
& Others Call For Unified Federal Privacy Protection covers how the major
search engines recently asked for better legal protections from the government.
But perhaps the search industry itself will move forward to develop better
privacy standards. I’ve hoped recently for some type of
Privacy Bill Of Rights. Since I doubt the government will act quickly,
perhaps the industry will go faster before a third incident causes searchers to
completely lose faith in them.
AOL’s Jason Calacanis, who runs Netscape, is proposing that AOL
not keep search records at all. That might sound like a nice idea, but it’s
not practical. To not keep records raises issues with click fraud, plus with
internal tracking to determine how to improve a search engine itself in how it
responds and feeds queries. Putting better limits on how long data is kept might
help, as might developing ways to somehow remove personally identifiable
information that might get into search records.
Then again, Ixquick
recently tried a PR push on how it doesn’t keep records. Perhaps that’s
going to be a way for some players to win new users. Just make sure you also use
some tool like
Anonymizer to keep your ISP from logging your actions. Otherwise, your data
is still out there and being recorded in another way.
The postscript then goes on with a long list of links to stories about search
privacy issues, so
check it out,
if you want to read more background about the issue.
via TechCrunch, the AOL Search
Database is a new site that has taken all the data and allows anyone to
search through it. The site’s up and down due to demand, so be forewarned. It
also lacks documentation, but here’s a very quick guide to what I’ve played with
User ID: To see the searches done by a particular person, you enter
the anonymous user number they’ve been given. The main problem is that I have no
idea where the numbering sequence starts. For example, enter 1 into the box, and
you get nothing. Enter 1083349, and that brings up the records for that user
(well, it should — when I tried, I got a database error because of a behind the
Search Keywords: Enter a term here, and you’ll see all the people who
searched for that word. For example, entering [murder] gave me a list of
everyone who looked for that word or phrases that include it (such as murder.com).
I haven’t tested to see if there’s a way to do an exact match yet. This is also
an easy way to obtain user numbers, if you want to then check out particular
Date Of Search: I haven’t tried it yet, but I assume this will give
you all searches done on a particular day.
Website Results: Again, I didn’t have a chance to play with this, but
I assume if you enter a URL (say playboy.com), you’d see all the people who did
a search, got that site listed and perhaps clicked through.
When you are done exploring, you can enter your findings into Valleywag’s
Find the scariest AOL user search record contest. So far, this isn’t scary
Scariest search records: AOL saves crew of Oceanic flight 815. Over at
AOL User 231392 Illuminated is a little more scary.
Prefer to roll through the data on your own, or perhaps build a better
interface for searching it? This
mirror site offers the data that AOL pulled yesterday.