SES Chicago - December 7-11, 2009

October 10, 2009

Wake Up! - The FTC is screwing with bloggers

If you are a blogger in the US your life is about to change big-time.

You have just entered the Twilight Zone...

New FTC guidelines (read full version) described in the official press release state:

1 - "the post of a blogger who receives cash or in-kind payment to review a product is considered an endorsement. Thus, bloggers who make an endorsement must disclose the material connections they share with the seller of the product or service."

and

2 - the revised Guides reflect Commission case law and clearly state that both advertisers and endorsers may be liable for false or unsubstantiated claims made in an endorsement - or for failure to disclose material connections between the advertiser and endorsers. The revised Guides also make it clear that celebrities have a duty to disclose their relationships with advertisers when making endorsements outside the context of traditional ads, such as on talk shows or in social media."

My reading of this is very disturbing.

Here is a possible scenario:

  1. You - a "social media" "celebrity" "blogger" (this is anyone who has more than a few followers on Twitter or some number of subscribers to their blog RSS feed) reviews a book, product, or service - making it an "endorsement"
  2. You got a copy of the book to review, or got a free trial of the product, or a free trial of the service
  3. You did not mention the freebie in your blog post
  4. If someone does not like your blog posting you can be sued

To try to regulate bloggers as if they were professional journalists or compensated endorsements is asinine (incidentally - these guidelines do not apply to professional journalists!) The FTC is trying a land-grab into Internet regulation so they can extend their bureaucratic tentacles and justify their continued existence and funding. All of this is being done under the slogan of their official tagline "Protecting America's Consumers". This of course begs the questions - "from whom?"

This is a screwy world we live in, but the whole premise of blogging on the Internet is predicated on the notion that anyone can have frank and open discussions about any topic of their choosing. Most bloggers do not get paid and do not make any money directly or indirectly from their blogging efforts. They try to build their reputation and disseminate information that their followers may find useful. They never claim to be "objective" and often hold very strong, peculiar, and very personal opinions.

It has always been "buyer beware" on the Internet. I don't think anyone needs to be reminded that we should carefully consider the source and reputation of any information that we encounter online. We certainly don't need a chilling effect on the whole online conversation from a huge government agency.

It is ironic that this is happening under the direction of a man who was elected with the strong support of the Internet community and specifically active social media leaders. Unfortunately typical liberal-leaning tendencies are also to regulate people's lives via the government in order to protect them against unscrupulous big-business practices.

Don't get me wrong - frankly I don't care if the assault on individual liberties comes from the left or right (the four FTC commissioners who voted unanimously for the new guideline were all appointed by Bush). But I do care when big brother injects themselves into normal Internet discourse this heavy-handedly.

Fight this unconstitutional over-reach - these are simply regulations from unelected bureaucrats within the executive branch.

Let's make our voices heard and protect the First Amendment and our ability to have unfettered discourse without fear of lawsuits online.

BTW - no one paid me to "endorse" this position on the new FTC regulations - I guess that my butt is now legally covered (at least for this blog post).

Posted by Tim Ash at 7:12 PM | Permalink | Comments (12)

August 28, 2009

Facebook to Update Privacy Practices in Response to Privacy Commissioner of Canada

Facebook has been working with the Office of the Privacy Commissioner of Canada to come up with solutions to concerns that the office has. The updates will take up to 12 months to implement and involve three types of adjustments.

New Notifications

Facebook will work to encourage users to review their privacy settings. The goal is to help members make sure that they're aware of the default settings and to change the settings to reflect their own preferences.

Additions to Facebook's Privacy Policy

Facebook's Privacy Policy will be updated to provide descriptions of a number of privacy practices. Included will be reasons for date of birth data collection, account memorialization for deceased users, the difference between deactivation and deletion and how Facebook's advertising programs work.

These updates will be subject to a notice and comment period by Facebook members.

Technical changes for third-party data collection

A new permissions model will require third-party applications to inform users about which types of information they want to access. It will also require third parties to get consent before data is shared. Users will have to approve access to their friends' information. However, friend data would still be protected by their individual privacy settings.

What do you think of Facebook's plans for privacy updates? Let us know what you think in the comments section below.

Posted by Nathania Johnson at 2:39 PM | Permalink | Comments (0)

July 2, 2009

Trade Groups Outline New Behavioral Advertising Standards

In January, four trade groups announced that they would be developing behavioral advertising standards. The groups are The American Association of Advertising Agencies (AAAA), The Association of National Advertisers (ANA), The Direct Marketing Association (DMA), and The Interactive Advertising Bureau (IAB).

The standards have now been released and are as follows:

The Education Principle calls for participation in efforts to inform individuals and businesses about online behavioral advertising. The industry intends, in a major educational campaign involving over 500 million ad impressions over the next 18 months.

The Transparency Principle calls for clearer and easily accessible disclosures about data collection and use practices. The result will be a new notice on the page where data is collected and will occur via links embedded in or around advertisements, or on the Web page itself.

The Consumer Control Principle expands the consumer's ability to opt-out of data collection. The opt-out will occur via a link on the page where data is collected. This principle also requires service providers such as Internet access providers and desktop application software companies to obtain consent of users before engaging in online behavioral advertising.

The Data Security Principle calls for reasonable security and limited retention of data.

The Material Changes Principle calls for the acquisition of consent for any material change to data collection and use policies as well as practices to data collected prior to any change.

The Sensitive Data Principle requires parental consent for consumers known to be under 13 on child-directed Web sites. This Principle also calls for heightened protections to certain health and financial data when attributable to a specific individual.

The Accountability Principle calls for the development of programs to monitor and report uncorrected non-compliance to appropriate government agencies. The CBBB and DMA will work cooperatively to establish accountability mechanisms under the Principles.

Posted by Nathania Johnson at 1:05 PM | Permalink | Comments (0)

June 24, 2009

Your Data or Your Money: Is a Proposed Opt-In Privacy Bill Really Good for Consumers?

Congress is preparing an opt-in privacy bill for online advertisers, according to Peter Kafka. The effort is led by Rep. Rick Boucher of Virginia.

This means publishers couldn't serve up behavioral ads unless getting permission first from the consumer. Right now, most consumers can opt-out, though most probably don't give it a thought as they browse the web.

Though details on the bill are vague, Kafka rightly points out that most advertisers and/or publishers could work around the new regulations by offering incentives to those who opt-in.

Another option would be offering an ad-free version of a site for a premium. It's no secret that many in the media world are hoping to push online publishing in that direction. From charging for online newspaper access to charging for Hulu, media execs are looking for non-advertising ways to fund their sites and networks.

They could sweeten the deal by making the opt-in process completely miserable. You'll probably have to hand over your email address and then get tons of junk email in order to access content for free. I already experience this for one local newspaper, but imagine if there were new regulations as an excuse!

If you think the web is a bit messy right now, just wait until you have to opt-in all the time. The intentions of the bill may be to protect the consumer, but it more likely will create ultimatums: agree to advertising, pay for content or miss out altogether.

That's just my opinion. What's yours? Sound off in the comments.

Posted by Nathania Johnson at 11:19 AM | Permalink | Comments (0)

June 18, 2009

Privacy Roundup: Google Street View, French Law, and Congressional Hearings

Privacy concerns are ever-present, but the past week has generated significant buzz on the matter.

First up, Google has been asked by a the European Article 29 Working Party to keep "unblurred" photos for Street View for as little time as possible. The conundrum is this. Google sends its cars out to photograph countries. They use technology to blur things like license plates. However, sometimes the tech goes awry and blurs things that don't need blurring. When Google finds out about it, they use the original, unblurred photo to correct and then add it back into the system. Clearly, that raises a privacy issue. Google says it's working with the Article 29 Working Party to determine the amount of time they should keep the photos, but no solid timeframe has been given yet.

Speaking of Europe, France is putting the pressure on social networking sites like Facebook when it comes to privacy issues. The matter at hand is the trend towards being "open" so that third-party developers can build applications using APIs. French politicians are concerned about these third parties gaining access to private information. There are two things the politicians should consider. One is that most APIs are restrictive. In other words, you don't get access to all of a social network's functionality just because there's an API. Secondly, most of the networks provide an option for users to opt-out of their information being shared.

Last but not least, the United States Congress is having yet another round of hearings on web advertising and privacy. Yahoo! Vice President of Policy and Head of Privacy Anne Toth today testified at the House Energy & Commerce Committee subcommittee's "Behavioral Advertising: Industry Practices and Consumer Expectations." Toth explained the benefits of relevant advertising but also touted the Yahoo! Privacy Center.

Google Deputy General Counsel Nicole Wong also appeared at the hearing. Wong spoke about Google's recent launch of interest-based advertising. The benefits of relevancy of the ads was a talking point for Wong, as well.

Posted by Nathania Johnson at 3:59 PM | Permalink | Comments (0)

July 3, 2008

Judge Protects Google Source Code, But Not YouTube Users

Remember when Google and Viacom were friends? Ah, those were the days. But not anymore. Over a year ago, Viacom filed suit against Google for the copyright infringment found on YouTube videos. In the latest plot point in the ongoing saga, U.S. District Judge Louis Stanton has ruled that Google can keep its source code secret, but must hand over user logs for the popular video sharing site.

Viacom says it wanted the code to prove that Google could use it to "purposely" find the content in question. Nice try, Viacom. Google's code, of course, is a trade secret. But it's almost a wonder the judge protected the code, because he ruled that Viacom can have access to the user logs. Data to be released includes user names, IP addresses, and videos watched.

Google has often defended its data collection, saying it's not a threat to privacy. It appears the argument worked a little too well on Judge Stanton.

For a history of the Google-Viacom battle, check out these links: Google Fights Back in Viacom/YouTube Copyright Suit Others Join YouTube, Google Copyright Lawsuit Viacom Would Rather Not Sue, Chief Counsel Claims Google to Viacom: Don't Turn YouTube into SueTube

Posted by Nathania Johnson at 10:52 AM | Permalink | Comments (0)

April 10, 2008

Majority of U.S. Adults Uncomfortable About Search Engine Data Collection

A majority of U.S. adults are uncomfortable about search engine data collection practices, according to a survey conducted by Harris Interactive. 59% are uneasy about the ads that are based on search behavior.

Search engines maintain that the targeted ads help them keep services free, and introducing that concept to survey participants did seem to alter the majority opinion. In light of that information, a 55% majority said it was ok after all to have those ads based on collected user data.

But that doesn't mean searchers don't retain some reservation. Only 9% are very comfortable with the ads knowing that they help produce free products, an increase from 7% without that knowledge.

Related Reading: Google Responds to FTC's Self-Regulatory Principles European Group Wants to Cut Search Engine Data Storage Election Year Brings New Efforts to Regulate Search Engine Data Collection

Posted by Nathania Johnson at 11:07 AM | Permalink

November 21, 2007

Privacy Group Takes on Facebook

The latest advertiser to use Facebook's new Social Ads platform may not have been the kind of marketer the company was hoping for. Privacy watchdog MoveOn.org has posted an online petition demanding that Facebook respect the privacy of users by making it easier to opt out of the "Beacon Ads" program, where actions Facebook users take on partner sites are added to their Facebook news feed. The feed is delivered to the user's friends, along with a related ad and the user's profile picture.

The full petition text reads: "Facebook must respect my privacy. They should not tell my friends what I buy on other sites – or let companies use my name to endorse their products– without my explicit permission."

MoveOn.org has also launched a group on Facebook, "Petition: Facebook, stop invading my privacy!." There are currently more than 4,900 members.

Facebook does not offer users an option to opt out of the program, other than eliminating their feed altogether. Users can opt out on a case by case basis with each advertiser.

When a user buys something from a participating advertiser, a pop-up box notifies the user that it will send that information to Facebook, with an option for the user to click "No thanks." If the user doesn't, a message will pop up at their next Facebook visit asking for permission to share that data with the user's friends.

MoveOn.org thinks that's not enough. In the Facebook group, they write: "Facebook says its users can 'opt out' of having their private purchases reported to all their friends. But that option is easily missed. And even if you do 'opt out' for purchases on one site, it doesn't apply to purchases on another site – you have to keep opting out over and over again. The obvious solution is to switch to an 'opt in' policy, like most other applications on Facebook."

More coverage on Techmeme.

Posted by Kevin Newcomb at 9:33 AM | Permalink

November 3, 2007

Your Visitors Appreciate Targeted Ads, Really

Despite all the privacy concerns of late, online shoppers appreciate the benefits of personalized and targeted promotions.

According to an Avenue A | Razorfish study, some 72% of online shoppers find personalized recommendations helpful. In the chart below, only 34% were concerned with privacy -- and that means 66% don't have these concerns when receiving recommendations.

We do live in a capitalist society, and see ads all day long. Also we are all shoppers, whether literally shopping or not at that moment. So it doesn't surprise me that so many people appreciate promotions related to their interests or prior purchases.

Publishers and advertisers should gain comfort from these findings. While it would be good to ask these same questions about ads versus recommendations, I think it's fair to say that targeted ads won't be a problem for most people. They might really see them as useful instead.

This week, AOL said they will allow individuals to "opt out" of their ad targeting services, which are based on surfing patterns alone. If you extrapolate from this study, then very few people will actively opt out. We'll wait and see.

Posted by at 5:07 PM | Permalink

October 28, 2007

Physical Addresses To Aid Online Targeting?

As online marketers, we have the luxury of reaching our targets based on what they search, browse or click on. That tells us a lot about expressed interests. By contrast, physical addresses provide information about the probability of interests.

Recently, Acxiom announced new services (WSJ article, paid access) which actively connect addresses to online ads. When their customers collect addresses online, Acxiom maps them to lifestyle codes and enables ad targeting using these codes.

Where you live speaks volumes, especially to off-line marketers without other insights. You and your neighbors share demographics, media interests and consumption patterns. For example, affluent Texan neighbors may buy parkas for their ski vacations while most citizens never think about them. (Check out your own zip code at Claritas.)

Admittedly these lifestyle code refinements can help *a little* online, but privacy risks may quickly erase the benefits. Any kind of secondary use of addresses is likely to raise concerns from end-users and privacy advocates. I believe this is a case of "we can connect the dots" but at what cost?

October 31st Update:

Regarding privacy, it's my contention that most consumers don't really pay much attention to how their cookies are used. Still we should expect people to step forward and identify risks, which happens whenever new marketing data's introduced. In a world where even search engines age out cookies, we are simply in a heightened state of alert.

Today, Acxiom reached us about how they protect the privacy of consumers and their Personally Identifiable Information (PII). These details are worth passing along:

* When a consumer registers on a partner site, Acxiom uses his/her address to assign a specific segmentation cluster code. * This code contains no PII, and consumers are notified that a third-party cookie will be set. * The cookie that is set is completely anonymous and contains the segmentation cluster code. * There is no way for either Acxiom or advertisers to access consumers' PII through the cookies.

Also, Acxiom pointed out that they don't redistribute addresses in any way. My “secondary use of addresses” was misleading, as I meant the segmentation cluster codes – not additional use of household information. I hope this clarifies for SEW readers.

Of course, I look forward to seeing how Acxiom and others will aid online targeting, as the posting title suggests!

Posted by at 2:09 PM | Permalink

July 9, 2007

Yahoo, Microsoft To Change Privacy Policies

Pressure from the European Union has Yahoo and Microsoft changing their privacy policies, according to the Financial Times.

"The Article 29 Working Party, a group of national officials that advises the European Union on privacy policy, last month said it wanted to investigate how long companies such as Yahoo and Microsoft keep data on individuals who use their search engines," FT.com reported.

“We are talking to customers, to the industry and government officials about this, and intend to provide an update in the near future which will more directly give the time frame,” said Brendon Lynch, privacy expert at Microsoft.

Posted by Frank Watson at 11:23 AM | Permalink

November 30, 2006

Google Ordered By Another North Carolina Court To Remove Pages

Apparently, North Carolina is going to start a trend of people who get court orders to remove material Google has spidered when left out in public view. This week, Google was ordered to remove material by a court in that state. It follows a similar court order in a different case earlier this year.

North Carolina County Gets Restraining Order Against Google from the Associated Press covers how social security numbers, cell phone numbers and other personal information was left online by Johnston County, which means Google (and likely other search engines) spidered the material.

When the country realized this, they sought to have it removed. However, they were told it might take up to five days to remove, prompting the county to go the legal route:

Fearing the possibility of identity theft, Johnston County officials asked Google on Monday to remove the information. It was first posted on the county's Web site by accident six weeks ago and discovered Friday. Mountain View, Calif.-based Google responded that removal could take up to about five days, said county attorney Mark Payne.

"It surprised me that Google didn't immediately recognize that this was something that posed a real danger of real damage to our citizens," Payne said.

Hey, it surprised me that Johnston County didn't immediately recognize that the information shouldn't have been put on the public web in the first place. However, that appears to have happened because of a third party contractor.

What about the automatic URL removal system? I seem to recall that as getting pages out in 48 hours or less (but I might be remembering incorrectly). Checking today, officially it is longer (unofficially, I hear it goes faster):

You may process your URL for removal from Google's search results. URLs will be removed after we've verified your request. Bear in mind that verification can take several days or longer and all pages submitted via the automatic URL removal system will be removed from the Google index temporarily for six months.

Google Blamed For Indexing Student Test Scores & Social Security Numbers and Follow-Up: School Couldn't Reach Google Until Injunction Filed cover how a school authority in North Carolina went to the courts to remove pages from Google in June.

Posted by Danny Sullivan at 1:06 PM | Permalink

October 30, 2006

Google Appeals Federal Judge's Orkut Ruling

The International Herald Tribune reports that Google has appealed a federal judge's order to hand over the IP address information to Brazilian authorities. Google claims the "federal civil court did not have the proper authority" for such information. But Google spokesperson, Debbie Frost said Google will help Brazilian authorities identify individuals accused of illegal activities on Google's social networking platform, Orkut. This history goes way back, just start from here and keep clicking those links back to the previous stories. This amazes me that it is still going on since early this year.

Posted by Barry Schwartz at 9:13 AM | Permalink

October 13, 2006

Stanford University Keeps Yahoo's $1M After Strong Criticism

Mercury News updates us that despite the fact that Stanford University criticized Yahoo for helping China, and Danny pointing out they may accept the $1M was again criticized, Stanford will be keeping the $1 million grant. The director of the fellowship program said they are "considering holding a forum to engage Yahoo and other media companies about operating in repressive countries."

Posted by Barry Schwartz at 10:36 AM | Permalink

October 2, 2006

Reading Other People's Gmail Via Bloglines

Using Bloglines to snoop on people's private Gmail from Martin Belam looks at how he accidentally stumbled upon email feeds that individuals are posting to Bloglines. To be fair, it's an issue that could happen to any "private" feed that someone unknowingly shares to the public.

Gmail allows people to get a feed of their email, as covered in these help pages. That lets you see the subject of your emails along with short descriptions. But even this small amount of information might be too embarrassing for some people to have made public.

How would those summaries get made public at all? In the case Martin looks at, people are adding their Gmail feeds to Bloglines but leaving those feeds public for others to view. That's how he stumbled upon them.

Google does warn about this, but he thinks the warning could be more visible. Perhaps -- but it's also worth keeping in mind that using an online news reader means you need to carefully consider ANY feed you take and whether those settings are public or not.

Postscript From Bloglines:

Bloglines is committed to online privacy and we take our role in this effort seriously. I'd like to help correct some of the misconceptions and explain how Bloglines privacy works in regards to both search and feeds as well as how to use Bloglines properly to generate secure feeds.

The main issue at hand is the appearance of Gmail accounts in Bloglines and a users's ability to subscribe to these feeds (or search for posts from these feeds).

The examples displayed were actually Gmail accounts registered through a third party (Feedburner) and then subscribed to within Bloglines.

Bloglines actually provides HTTP authentication for secure feeds. When this method is used, Bloglines secures the feed so that it can not be searched on or subscribed to except by the owner of the feed.

However, when the user generates their feed through a third party like Feedburner, the authentication portion has been removed from Bloglines' control and we have no way to identify and secure the feed. As a result the feed and it's previously secure data become public. Clearly this is a problem and we are in contact with Feedburner and other third parties to help them better inform and protect their users.

The other issue is the definition and understanding of "private" feeds within Bloglines. Marking a feed as private in Bloglines only hides the feed from your public blogroll and your identity from the feed's list of subscribers. We try to make this clear to Bloglines users by prominently displaying the following note during the feed subscription process:

"Private subscriptions don't show up in blogrolls and you will not be listed as a public subscriber. However, the feed and all its posts will remain available to the public via Bloglines and Ask.com Blog & Feed Search. Exceptions are Bloglines email subscriptions and feeds that require http authentication. In both cases, the feed and its posts will not be included in search results."

This issue has reminded us that there is still some confusion about privacy in the world of feeds. We recognize that a better system of limiting access to feeds is needed as more content becomes syndicated or syndicatable. We have been leading the effort to build new safeguards into syndications standards and are hopeful that some type of Feed Access Standard will provide further security for users and their feeds.

Posted by Danny Sullivan at 8:36 AM | Permalink

September 28, 2006

Google Not To Deliver Orkut Data To Brazil Authorities

Google won't hand data to Brazil judge from the Associated Press reports that Google will not be meeting the deadline to provide Brazilian authorities with the data they requested on specific Orkut users. This comes after Google saying they will hand over the data to Brazil. So the question is, will Google be fined $23,000 per day by the Brazilian judge until they comply? Google has promised to issue a court explanation as to why they cannot provide the data Brazil requested. The AP article also quotes Debbie Frost of Google saying, "We have and will continue to provide Brazilian authorities with information on users who abuse the Orkut service, if their requests are reasonable and follow an appropriate legal process."

Posted by Barry Schwartz at 8:55 AM | Permalink

September 26, 2006

Yahoo Fellowships For Repressed Journalists, While Chinese Journalist Might Sue Them

Earlier this month, I dinged Google over hypocrisy for getting behind Banned Books Week given its support of censorship in China. Now, a similar ding for Yahoo. Yahoo funds $1M Stanford journalism fellowship from the San Jose Business Journal covers how Yahoo -- under fire for allegedly harming journalists in China -- is going to fund fellowships for journalists in countries with press restrictions.

From the article:

The new Yahoo International Fellowship will be aimed at journalists from countries where there are restrictions on freedom of the press, either by governmental agencies or other forces, said James Bettinger, director of the Knight Fellowships.

The first Yahoo International Fellow will be Imtiaz Ali, a reporter for the BBC Pashto Service in Pakistan.

Meanwhile, Jailed Chinese journalist to file US suit versus Yahoo from IDG News Service covers how a Chinese journalist jailed after Yahoo is said to have handed over incriminating email to the Chinese authorities plans to file suit against the company in the US.

Posted by Danny Sullivan at 8:47 AM | Permalink

Class Action Lawsuit Filed Against AOL Over Search Data Release

TechCrunch reports in Suit filed against AOL; seeks to block search history storage that a class action lawsuit has been filed against AOL seeking $1,000 in damages for each person whose search records were released last month.

The release involved 658,000 individuals, so that's potentially a $658 million bill, if the case succeeds. It's even more if some of those people are California-based, since the case seeks $4,000 per California individual, according to TechCrunch.

Of course, not all of these people actually can be identified. To date, exactly one person was positively identified. The New York Times guessed at her identity, and she herself confirmed it. No doubt, others can also be determined, but not every one of the people involved will be. So when the suit says:

The search queries themselves contain information that identify AOL members who made each search.

That's only correct for a subset of the total users. Similarly, the lawsuit states:

The Member Search Data holds sensitive financial information about the AOL members, including but not limited to names, street addresses, phone numbers, credit card number, social security numbers, financial account numbers, passwords and usernames.

True, in some cases. Not all of them. In fact, probably not true for the majority of them.

It will be interesting to see if the court case ultimately finds that everyone should receive payment, given the potential harm they suffered, or if it will only pay to those who prove in some way they actually were personally identifiable or had personal information released in some way because of AOL's actions. Perhaps there will be a compromise between the two, if the case succeeds.

You'll find the text of the lawsuit here. Three AOL users are named, though no evidence I see in the suit suggests that any of them were actually personally identified in some way by the release. That might come in future filings or existing filings also submitted to the court, of course. A release about the suit from the law firm filing it is here. The Associated Press also has coverage in 3 AOL Subscribers Sue Over Data Release.

Aside from cash payouts, the case also wants AOL to enforce a license prohibiting commercial and non-research user of the data, plus wants the material removed from internet search engines (which means, really, getting it off the internet itself). It also seeks to prevent AOL from storing any type of web search data and to destroy any already in its possession.

Posted by Danny Sullivan at 7:57 AM | Permalink

September 8, 2006

More On Google & Blocking Privacy Proxies

Yesterday I wrote about how several proxy servers used by those wishing to search and surf anonymously had apparently been blocked by Google, including the popular Tor service. Google's since explained why these were blocked and how human users can get around the barrier.

Google told me that someone or something was using the Tor system to hit them with an extremely large number of queries, which caused the block on the network to come online.

Couldn't Google have done this in a way to filter out the humans but block the spiders? Cory Doctorow, who wrote the Boing Boing post on the subject, especially felt Google was being too heavy handed. In an email exchange we had, he wrote me:

Google has a lot of engineering talent, but it approached this problem with a fireax, not a scalpel.

Actually, Google is using both a fireax and a scalpel. It's just that some Tor users might not see the scalpel, if they have cookies disabled, from what I can tell.

A human user, with a browser that accepts cookies, would get a slightly different block page. This one would allowing them to prove they weren't a spider via a CAPTCHA code.

In other words, look at this image from The Chunk, which sparked yesterday's Boing Boing post. Now look at the image of a very similar page that you'll see here.

Notice how the second example has a part that says:

 To continue searching, please type the characters you see below

After this is a code, a CAPTCHA, a system to filter out robots that can't read the text in the image.

Anyone set to accept cookies will see the CAPTCHA challenge, be able to fill it out and continue searching. But isn't accepting cookies defeating the purpose of using a system like Tor designed to keep you anonymous?

Not necessarily. For example, in Firefox, you could choose to have cookies cleared every time you close the browser. That means for your searching session, Google will only know that someone from an anonymous IP (it can't be traced back to you, remember) did a series of searches for a particular session of time.

Close your browser, come back to Google, and you'd get a new cookie (along with an entirely new IP address). There would be no way to associate your searches over a long period of time, which potentially could lead to how one person was identified in the recent AOL data release case -- assuming somehow, someway, someone got to all of Google's data over time.

It's unlikely -- though still possible -- that you could do enough searching within one session to give yourself away just based on your queries. For those still concerned about this, I suppose you could do a search, then clear your cookie and search again. Alternatively, don't search for anything that you think could potentially reveal who you are.

For more on protecting your search privacy, see my past posts Which Search Engines Log IP Addresses & Cookies -- And Why Care? and Protecting Your Search Privacy: A Flowchart To Tracks You Leave Behind.

Could Google do things better? Absolutely. Since many people using services like Tor might not be allowing cookies, Google should change the page that comes up for "robots" to say something like "if you're a human, please allow cookies, and then you'll get a code to let you in." Google could even take the further step of detailing how to set up cookies and clear them in popular browsers to better guide those concerned about privacy. And to be fair, all the search engines could do more on that front.

That page can definitely be more helpful in other ways. When I've heard of this happening in the past, it was typically because someone from a particular ISP or shared IP address was doing a lot of rank checking. That might cause the entire IP range to get closed.

Unfortunately, Google's current warning page doesn't give the unfortunate innocent users much guidance that things outside their control might be to blame. Instead, it sends them thinking that maybe they've got a virus or spyware. I can see that has caused at least one person to waste time checking how to "fix" a problem they didn't have.

It would also be nice to see more help pages on Google about this in general. All these things are ideas Google said it will consider.

Postscript: Cory emailed me this: Danny, I believe that they could solve this problem without requiring cookies -- for example, they could embed a RESTful, expriring GUID in the URL-line on the successful solution of a CAPTCHA:

http://www.google.com/search?q=boing&CAPTCHA=KJASJFSE

Posted by Danny Sullivan at 8:04 AM | Permalink

September 7, 2006

Google Protecting Itself Or Harming The "Innocent"

Google blocking privacy technology over at Boing Boing has Cory Doctorow writing up how Google is blocking some proxy servers from making requests. An attempt by Google to stop those trying to protect themselves from prying eyes? Many have told him that it's Google likely stopping automated request from proxies. That doesn't let Google off the hook. Cory writes:

But that doesn't change the essential point: Google is fighting bots by compromising its users' privacy -- the countermeasure is a form of punishing the innocent to get at the guilty.

OK, Cory, let's spin it around. Let's say that someone starts hitting Boing Boing with thousands of automated requests. Maybe they want to scrape your content to harvest a little AdSense cash. Maybe they don't like you. They do this through anonymous proxies. Ultimately, you decide to block requests from those proxies. Can we now declare that move was Boing Boing punishing the innocent to get at the guilty?

I like that we have proxies to help those who are worried about privacy to stay anonymous. There are plenty of people with that type of need. But let's also not paint Google or other services that can seriously get impacted by automated queries into somehow being evil if they've had to stop from getting slammed. A more constructive move will be to hope that Google can figure out a way to help support some proxies but also not simply get abused itself by them.

Postscript: Please see More On Google & Blocking Privacy Proxies.

Posted by Danny Sullivan at 12:16 PM | Permalink

September 4, 2006

Google Says They Will Give Brazil Orkut Data

The Washington Post reports that Google will give over the Orkut data of specific users including; IP addresses with time and date stamps that can help trace a specific user and registration information including names and e-mail addresses. This comes after Brazil gave Google 15 days to comply or else be fined $23,000 per day.

Why turn over data to Brazil when Google famous resisted the US government earlier this year for a data request? Reports the Post:

"What they're asking for is not billions of pages," said Nicole Wong, Google associate general counsel. "In most cases, it's relatively discrete -- small and narrow."

Posted by Barry Schwartz at 10:16 AM | Permalink

September 1, 2006

Google Has 15 Days To Provide Data To Brazil Or Be Fined $23,000 Per Day

AFP reports that Brazil has given Google Brazil 15 days to turn over the data on the Orkut users they have been asking for. If Google Brazil does not comply, they will be fining them $23,000 per day. Google has said that they would work with Brazil to shut down Orkut some communities but according to the court filing in Sao Paulo yesterday, those requirements have been 'unsatisfactorily met.'

We have a good historical round down of this whole Google & Orkut & Brazil issue here. Business Week also has a nice write up on the issues named Google's Brazil Headache, highlighting why Google is saying they'll comply if only the requests were sent to Google in the US, rather than Google Brazil.

Posted by Barry Schwartz at 9:20 AM | Permalink

August 23, 2006

Researchers Debate Whether To Study The AOL Data

When the AOL privacy case broke earlier this month, I wrote about how the intention of releasing the data was honorable despite the ineptness of how it was done. Those trying to research search behavior have been starved for decent data. Researchers Yearn to Use AOL Logs, But They Hesitate from the New York Times covers this in more detail, about how the existing data sets out there are nearly 10 years old.

Along the way, we discover researchers are debating if they should use the data. I'd say you might as well. It's not like you'll be getting more any time soon. As long as the researchers aren't themselves republishing in a way to violate someone's privacy, it's hard to see the harm. At this point, the data has been spread so far and wide, accessible in many ways, that it's difficult to see what the researchers think they'd be protecting by studying it.

The story also touches on data releases from other search engines (Yahoo and Microsoft say they've done some controlled, limited releases; Google says they hand nothing out). It also highlights how the researcher who put the data out -- again with the best of intentions -- simply didn't realize that people would be able to be tracked down through their search profiles.

Most interesting is the end of the story, looking at if there's a way to scrub the search stream so that data could be released and be untraceable. I've said I'd love to see that type of solution happen. But it would have to be foolproof, and I'm not sure how that can happen unless you have human review of profiles that might go out.

Meanwhile, the San Jose Mercury News in What do Google, Yahoo, AOL and Microsoft's MSN know about you? effectively does over the same survey of how long data is kept that News.com did last February, in the wake of the US Department Of Justice search privacy debate. I mentioned the story before, but let me highlight a key part of it:

While AOL is unique among the Big Four in that its users are easily identified by an AOL user name after they have logged in, people who frequent Google, Yahoo and MSN are also monitored by a combination of digital tracking systems.

Nope, AOL is not that unique. If you've logged into Google, Yahoo or MSN to use any of their services, chances are when you search, they'll also have you keyed to a particular profile that's more unique that just looking at your IP address or a cookie. The story does explain this more, and my previous post Which Search Engines Log IP Addresses & Cookies -- And Why Care? goes into the explanation in more depth. In looking at that previous post, I also saw this:

[News.com]: Given a list of search terms, can you produce a list of people who searched for that term, identified by IP address and/or cookie value?

[AOL]: No. Our systems are not configured to track individuals or groups of users who may have searched for a specific term or terms, and we would not comply with such a request.

Despite the response, I'm 99 percent certain AOL does indeed log IP addresses and cookies along with search data. Searching on AOL creates a page request with the search terms embedded in the page's URL. That request will be logged. If it's logged, it can be analyzed. In fact, AOL later says they can give you a list of searches that were done by a particular IP address or cookied browser. If you have that information, you have the opposite.

Of course, we now know that it was indeed the case that you could take AOL's data, give it a search term and get a list of individuals who searched for it. Yes, the individuals were given anonymous numbers, so the AOL answer is technically correct. But the overall profile of what someone was searching for in some cases turned out to be personally revealing.

I'm planning a longer recap on some of the latest out of the AOL case, but in the meantime, I still keep coming back to this conclusion from an earlier post:

I think consumers will need more faith and control over how long search data is kept for them, plus the ability to opt-out or delete histories with a push of a button, perhaps the type of privacy/data control panel John Battelle has wished for. And as I've written, that has to include ISPs, many of which merrily sell search data that they monitor to third party companies.

I'm working on a longer look back at the fallout from the AOL release and ways forward. But a quick shout-out to Daniel Brandt of Google Watch is in order. Seth Finkelstein just gave him one, and I'll add to it. I've felt Brandt's often twisted things or focused on stuff that didn't matter much (Google's 30 year cookie that most people won't really have last for more than a year or two, if that). But his long-standing call for regular data destruction -- something other privacy advocates have also pushed for -- seems the most secure solution going forward.

Posted by Danny Sullivan at 2:19 PM | Permalink

August 22, 2006

Brazil To Close Google Brazil's Offices Over Orkut Issues?

A post in our SEW Forums and a report from Xinhau says that Brazil's federal prosecution service is moving to close Google's operations in Brazil. So far, there is no other news about this that we've seen. An injunction is apparently being requested ordering the release of information from Orkut, with a threat for closure of Google's Sao Paulo office if they don't comply.

Postscript From Danny: Reuters has a story up now here: Google refuses to hand over data to Brazilian authorities. It covers that prosecutors want permission to file a civil lawsuit against Google, with a $61 million fine and the threat of closure if it fails to comply with the information request.

Postscript From Barry: For an historical line up of these events over time, see the links below:

- Aug. 16, 2006 :: Orkut Causing Trouble In Brazil Again - Jul. 21, 2005 :: Drug Pushers Using Orkut Arrested In Brazil - May. 25, 2006 :: Google Works With Brazil To Shut Down Orkut Communities - May. 18, 2006 :: Google Faces Criminal Charges For Child Porn & Racial Material - May. 3, 2006 :: Google & Brazil Fight Over Orkut User Data Rights - Mar. 10, 2006 :: Brazil Asks Google To Help Orkut To Stop Organizing Organized Crime - Mar. 9, 2006 :: Al-Qaeda Likes Orkut

Posted by Barry Schwartz at 5:21 PM | Permalink

August 21, 2006

AOL Fires CTO & Two Employees After Search Records Slip Up

The Wall Street Journal just reported that AOL has fired the Chief Technology Officer, Maureen Govern, and two other employees after releasing search records last week. The article named "AOL Fires Technology Chief After Web-Search Data Scandal" discloses that Maureen Govern, the CTO along with the researcher who released the data and the manager overseeing the research have been all fired. I am kind of surprised that AOL hit someone so high to the top, but it does make a statement, a statement AOL must make.

Postscript From Danny: News.com has a nice follow-up here, Three workers depart AOL after privacy uproar, and Lisa Barone over at Bruce Clay highlights a Mercury News article where AOL's statement of keeping data "roughly 30 days" obviously didn't hold true. AOL also said they purge personally identifiable information after 30 days last year. To be fair, the search records did have personal ids removed. It's simply that the searches themselves made at least one person identifiable.

Posted by Barry Schwartz at 2:28 PM | Permalink

August 15, 2006

103 Links About SES San Jose 2006 (AKA The Big Recap)

Couldn't make it to last week's monster Search Engine Strategies show in San Jose? Well, maybe next time! In the meantime, I've compiled a list of coverage from across the web, even somewhat organized into topic areas.

Our San Jose show is always tough for me, as I arrive a week earlier to visit with the various major search engines out there. That means two weeks of news and email to dig out from, since you can never get it all done on the road. All that digging out means I know I don't have everything listed below. But you'll find plenty to keep you entertained.

General Recaps

Eric Schmidt Appearance

Eric Schmidt & Search Privacy

Click Fraud Panel & Related Coverage

Yahoo's Panama Ad Platform Preview

Social Search & Related Topics

Organic Listings Sessions

Search Advertising Sessions

Issues Sessions

News, Blogs & Public Relations

Big Sites/Budget Sessions

Small Sites/Budget Sessions

Conversion & Metrics

Other Sessions

Google Dance & Parties & Pictures

Posted by Danny Sullivan at 4:50 PM | Permalink

EFF Asks FTC To Limit How Long AOL Can Store Search Records

The Electronic Frontier Foundation has asked the US Federal Trade Commission to investigate AOL's release of search records last week and prevent the company from storing search data for longer than two weeks.

The formal complaint (PDF) asks for the FTC to:

order AOL to refrain from collecting or storing logs of its users' search activity except where necessary incident to the rendition of AOL's services or the protection of AOL rights and property, and to refrain in any case from storing logs of its users' search activity in personally identifiable form or for more than fourteen (14) days;

The EFF also wants all those whose searches were revealed through the data to be notified by AOL, which sounds like a good idea and something you'd think AOL would already want to do. Other things are requested, such as one year's worth of credit monitoring to protect against identity theft. That seems far-fetched, but I suppose you never know.

Coinciding with the complaint, the Wall Street Journal has a debate between the EFF and an internet lobbying group NetCoalition that apparently represents Yahoo and Google, among others.

The debate, Should Web Search Data Be Stored?, is free to anyone to view. It's well worth a read, if only to read that the US Department Of Justice is apparently arguing that access to search records might not require a search warrant, as the EFF says the Electronic Communication Privacy Act requires.

Overall, I'm much more on the side of the EFF in the debate. Some highlights from it and my remarks about them.

NetCoalition: Search queries are stored and used by Internet companies for internal purposes.

Me: Search queries have been shared by various companies in different ways with third parties over the years. More important, even if these are stored for internal purposes, there's no guarantee that they'll be perfectly protected. Leaks, accidental or intentional, do happen.

NetCoalition: There are good, legitimate reasons why an Internet company would use historical search queries for internal uses. For example, search query information can be used in research and development to make improvements to search technology, to better tailor and make more efficient users' online requests. Companies also analyze historical query information to detect and protect against click fraud -- an activity that involves faking clicks on Web advertisements to drive up costs.

Me: Excellent points, but the major search engines are going to have to step up now with better proof that there's no way data can be tied back with an individual, even when made "anonymous" in the way AOL has shown doesn't work. Click fraud refunds typically aren't given for activity longer than 60 days, so that provides a time horizon for how long data might be associated with actual users/IP activity.

NetCoalition: Search queries are essentially "directory assistance" requests from users to companies that help them find locations on the Internet. The Electronic Communications Privacy Act is meant to protect communications between and among users -- not to protect requests from customers for directions on the Internet.

Me: Wow, I think the search engines need a new lobbying group that understands search better. Searches can be directory assistance and much more than that. Search engines are confidents, trusted friends that we effectively tell secrets to in order to get advice. They aren't about getting location. They are about getting information.

NetCoalition: The Video Privacy Protection Act is a bad analogy. Internet companies do not match up the user's personal information (e.g., name, address and phone number) with search queries the way a video rental record would.

Me: Except they do. If you're logged in to a search engine, then any personal information you've provided is associated with your search query in some way.

EFF: The public needs to know the facts about how their data is being stored and used before they can make informed decisions as consumers as to whether and how to use a particular search engine, and to make informed decisions as citizens as to whether and how Congress needs to update the law. I think the best route would be hearings in Congress to get to the bottom of the issue.

Me: I think the best route would be for the search engines themselves to act in conjunction with privacy groups right now to get protections and standards in place. But if they can't act, then hopefully laws covering the entire search spectrum -- from ISP to search engine -- will be enacted.

NetCoalition: Search queries are not being linked to users' personal information and shared for marketing purposes.

Me: Except they are. Showing ads in response to a query, while long-standing and generally accepted, is a marketing purpose. Showing ads based on search profiles, such as the New York Times wrote about today, is a more extreme example.

EFF: My organization also strongly opposes proposals by the DOJ and Congresswoman DeGette that would force companies to store this kind of sensitive data for government use. That's like asking the post office to keep copies of our mail, or phone companies to keep recordings of our phone calls, just in case investigators might find it useful. The bottom line is that Americans deserve the same privacy protections online that they've always had offline, and that includes the ability to be able to speak and consume speech freely and privately, without fear that their deepest secrets might be shared with the government or published to the world. Yet when search engines accumulate this kind of data, such disclosures are bound to happen, as this week's news has demonstrated.

Me: Well said!

Postscript: I'd sent some questions over to the EFF and just got answers back from EFF staff attorney Kevin Bankston. Here they are:

Q. Why just AOL? Why aren't you asking for all search engines to be limited? I did see that you want federal laws to expand to cover them, but what happened with AOL could happen with the others as well.

A. Why aren't we asking the FTC to investigate and take action against other search engines? Because we can't, just like we can't go to court and demand that Google pay for AOL's mistake. The FTC isn't a suggestion box. We had a specific complaint about AOL--we think this disclosure violated their policy and therefore constitutes an unfair and deceptive trade practice--and we filed that complaint with the FTC. If other companies engaged in similar disclosures, we'd file similar complaints.

If you are familiar with our work, you know that we've been complaining about the logging practices of search engines as a category for a long while. In fact, I'm usually the one trying to explain to Google-hungry journalists that your Yahoos and AOLs and MSNs and other multi-service portals pose most if not all of the same privacy threats, so it's funny to be accused of singling out one of them for some sort of special mistreatment. We're merely reacting to a specific incident that happened to involve AOL rather than Google or Yahoo or MSN.

We want strong, clear legal rules that cover all the search engines; we want all the search engines to limit retention.

Q. Why just the search engines? Many ISPs are recording the same data but aren't being limited on data retention. It's actually more worrisome to me in that many ISPs are happily selling this data to third parties.

Again, if you are familiar with our work, you know that we are generally concerned about data retention by all stripes of online service providers (see, e.g., our white paper on best practices for online service providers, http://www.eff.org/osp/). So, in short, we share your worry. But again, we are reacting to a specific incident concerning a search engine, so our discussion right now is focused on search engines.

BTW, if you are specifically aware of any ISP that routinely collects the searches its users submit to other search engines, we'd love to hear more about it. I think that without very clear consent from the customer, that would be an unauthorized interception of your communications, and therefore a felony.

Q. How long does the EFF retain search data? You've got a search box. People do sensitive searches on your sites. I want to ensure AOL isn't being held to a higher standard than the EFF itself meets.

We don't retain search terms. Of course, since we use Google, Google does undoubtedly retain them. But we proxy everyone's requests so that their IP addresses and cookies are not transmitted to Google, therefore individual search terms are only identifiable to EFF visitors as a population and not personally or uniquely. In fact, we call this out on our site: if you click on the link next to our search box that says "about EFF's search," you'll see a pop-up that says "EFF uses Google for search functionality on www.eff.org. To protect your privacy, EFF proxies search requests to Google with a special CGI script on our server, thus hiding your IP address and your Google cookie (if any) from Google's servers."

Posted by Danny Sullivan at 11:02 AM | Permalink

Targeting Ads Based On Search Behavior & Privacy Issues Post-AOL

Back in 2005, I wrote about AlmondNet moving forward with showing ads to surfers across the web based on their search profiles at major search engines. The move raised big search privacy issues. Since then, AlmondNet's kept going -- along with others such as Yahoo, in mining search behavior to deliver ads beyond search results pages. Advertisers Trace Paths Users Leave on Internet from the New York Times today takes a look how Yahoo, MSN and AOL are all trying to push into the post-search ad delivery space.

I've always felt these programs would eventually raise greater concerns over search privacy, since it would make it even more readily apparent to people that they were having search profiles assembled for them. If you go back to the AOL search privacy poster child of Thelma Arnold, tracked down through her search requests, her comment was one I'm sure many searchers would have:

I had no idea somebody was looking over my shoulder.

Until the AOL search records release, many people still have had no idea they were being profiled. But I've felt post-search ads would help raise that concern. Why were you continuing to see ads based on things you recently searched for? Perhaps that would help raise awareness of search profiles.

The AOL release has changed all that. To me, post-search ads -- while promising -- are a non-starter until the search privacy issues are resolved. We've been told that data would be protected, yet it got out in one way via AOL. Though the intent was innocent, it might slip out in the future in other ways. Even Google CEO Eric Schmidt, when I asked him about search privacy and data destruction last week, said you could "never say never" about things not going wrong.

For these types of programs to move forward, I think consumers will need more faith and control over how long search data is kept for them, plus the ability to opt-out or delete histories with a push of a button, perhaps the type of privacy/data control panel John Battelle has wished for. And as I've written, that has to include ISPs, many of which merrily sell search data that they monitor to third party companies.

I'm working on a longer look back at the fallout from the AOL release and ways forward. But a quick shout-out to Daniel Brandt of Google Watch is in order. Seth Finkelstein just gave him one, and I'll add to it. I've felt Brandt's often twisted things or focused on stuff that didn't matter much (Google's 30 year cookie that most people won't really have last for more than a year or two, if that). But his long-standing call for regular data destruction -- something other privacy advocates have also pushed for -- seems the most secure solution going forward.

Posted by Danny Sullivan at 7:28 AM | Permalink

August 10, 2006

Daily SearchCast, August 9, 2006: Special Edition, A Conversation With Google CEO Eric Schmidt

Today's search podcast covers Search Engine Watch editor-in-chief Danny Sullivan talking with Google CEO Eric Schmidt live before an audience at Search Engine Strategies San Jose 2006 on topics ranging from search privacy to Google's expansion into all aspects of daily life. Tune-in by listening to this MP3 file, via our Odeo channel or through iTunes via this link (or use alternative iTunes instructions explained here) or though our Yahoo Podcasts channel. Prefer not to listen? Ah, darn. But that's OK, here's a rundown of what was covered:

General Write-Ups

Posted by Danny Sullivan at 2:32 PM | Permalink

August 9, 2006

Search Privacy Concerns Humanized As The New York Times Tracks Down Anonymous AOL Searcher

A Face Is Exposed for AOL Searcher No. 4417749 is an excellent read from the New York Times, where you can meet the person who is about to become the most famous searcher ever: Thelma Arnold, a 62-year-old from Georgia. Using the released AOL search records, the New York Times figured out who she was and interviewed her and her searching habits for the story. No more discussing whether anonymous search records might contain enough information to identify people. In some cases, they do (or at least enough to make an extremely good guess and get confirmation from the person themselves). Thelma Arnold now becomes the face of search privacy issues. Meanwhile, though not naming people, News.com has a good look at more searching behavior from the records: AOL's disturbing glimpse into users' lives.

Posted by Danny Sullivan at 10:53 AM | Permalink

August 8, 2006

More On AOL's Search Release & Ways To Search The Records

I've got some follow-up items about yesterday's story where AOL released user query records, including how anyone can now easily look at the data.

First, after Barry did a recap of the news, I added a postscript to the story with more of my thoughts. In case you missed it, here are the key parts below:

AOL: Dooooooh! from John Battelle and AOL apologizes for release of user search data from News.com have AOL apologizing for the release, now said to be data involving about 658,000 individuals from March through May of this year. AOL says the release of the data wasn't properly vetted for privacy issues and that the release intentions were innocent.

I believe that. Make no mistake, this was a big screw up. The researchers providing the data didn't think hard enough about how making it possible to build a profile of individuals, even if they were given anonymous names, might then make it possible to determine who those people are if they revealed enough information in their searches.

In addition, it's going to be very difficult for some law enforcement agency not to want to subpoena AOL for actual user names when they read about things that suggest a murder is being planned or may have happened, as covered above. I'm not saying they'll get it, but I think it's almost inevitable that someone will try. That will set off further privacy fireworks.

But yes, the original intention was innocent. I got an email about the research site last week (and with my traveling all last week, simply did not have a chance to check it out). Here's what a researcher involved with it emailed me:

Over the last few years I have witnessed a divide developing within Information Retrieval research - between the haves and have-nots. The ‘haves' are the companies like Google, Yahoo, MSN, and ourselves, with lots of resources and data. The ‘have-nots' are people without those resources such as academic researchers and smart guys at small companies. We want to be able to help anyone work on great ideas by giving them the data and infrastructure they need.

So we started building data sets and made them available for everyone to test their ideas with. Each data set features a dynamic view, which allows you to inspect the data without having to download it. We also built some APIs for news, video, audio and podcasts, which will save people time from having to do that themselves. We have tried to stay away from interfaces like web search as those are already around.

There's nothing evil in that. In fact, there's much to appreciate, intention-wise.

We all use search engines so much, and they are so important in our daily lives, yet they remain one of the most poorly researched media venues out there. Yes, we're getting new labs like the one from Yahoo at UC Berkeley. But most search behavior studies outside of the search engines have depended on ancient search logs from places like Excite from back in 2001 or so. Newer studies, if the search engines are doing them, simply don't come out often. So the intention to promote learning with this release was innocent, if not honorable. The execution was poor and inexcusable.

This is the second major milestone in raising awareness of search privacy issues this year. The first was the Department of Justice action, which rightly focused on whether we need more safeguards over what governments can request. Today's upset highlights the protections that are needed again corporate releases of data.

The good news is that perhaps it will spur better protections even more. Microsoft, Google & Others Call For Unified Federal Privacy Protection covers how the major search engines recently asked for better legal protections from the government. But perhaps the search industry itself will move forward to develop better privacy standards. I've hoped recently for some type of Search Privacy Bill Of Rights. Since I doubt the government will act quickly, perhaps the industry will go faster before a third incident causes searchers to completely lose faith in them.

AOL's Jason Calacanis, who runs Netscape, is proposing that AOL not keep search records at all. That might sound like a nice idea, but it's not practical. To not keep records raises issues with click fraud, plus with internal tracking to determine how to improve a search engine itself in how it responds and feeds queries. Putting better limits on how long data is kept might help, as might developing ways to somehow remove personally identifiable information that might get into search records.

Then again, Ixquick recently tried a PR push on how it doesn't keep records. Perhaps that's going to be a way for some players to win new users. Just make sure you also use some tool like Anonymizer to keep your ISP from logging your actions. Otherwise, your data is still out there and being recorded in another way.

The postscript then goes on with a long list of links to stories about search privacy issues, so check it out, if you want to read more background about the issue.

Next, via TechCrunch, the AOL Search Database is a new site that has taken all the data and allows anyone to search through it. The site's up and down due to demand, so be forewarned. It also lacks documentation, but here's a very quick guide to what I've played with so far.

User ID: To see the searches done by a particular person, you enter the anonymous user number they've been given. The main problem is that I have no idea where the numbering sequence starts. For example, enter 1 into the box, and you get nothing. Enter 1083349, and that brings up the records for that user (well, it should -- when I tried, I got a database error because of a behind the scenes glitch).

Search Keywords: Enter a term here, and you'll see all the people who searched for that word. For example, entering [murder] gave me a list of everyone who looked for that word or phrases that include it (such as murder.com). I haven't tested to see if there's a way to do an exact match yet. This is also an easy way to obtain user numbers, if you want to then check out particular user records.

Date Of Search: I haven't tried it yet, but I assume this will give you all searches done on a particular day.

Website Results: Again, I didn't have a chance to play with this, but I assume if you enter a URL (say playboy.com), you'd see all the people who did a search, got that site listed and perhaps clicked through.

When you are done exploring, you can enter your findings into Valleywag's Find the scariest AOL user search record contest. So far, this isn't scary but funny: Scariest search records: AOL saves crew of Oceanic flight 815. Over at Consumerist, AOL User 231392 Illuminated is a little more scary.

Prefer to roll through the data on your own, or perhaps build a better interface for searching it? This mirror site offers the data that AOL pulled yesterday.

 

Posted by Danny Sullivan at 2:48 PM | Permalink

August 7, 2006

AOL Releases Search Data & Raises Privacy Concerns

Techmeme is reporting a huge amount of concern over AOL releasing, then pulling, search logs done by 500,000 users over three months. The purpose of the release was to help search researchers better understand user behavior in conjunction with an industry event for search researchers happening in Seattle, SIGIR. The data was posted on the AOL research site, but has since been pulled.

Unlike what TechCrunch suggests, this isn't private data in that no personally identifiable information has been released. Instead, actual usernames have been replaced with anonymous one. However, this still means it's possible to track the behavior of a particular user and potentially know who they are if their searches contained personally identifiable information.

To understand this more, this page gives some examples gleaned from the new AOL data. Also see this example of someone who might be planning to murder his wife. Danny's earlier post, Private Searches Versus Personally Identifiable Searches, also covers the general difference between private data versus personally identifiable stuff.

How does what AOL compare to what the Department of Justice asked for from search engines earlier this year? It actually goes further. The DOJ simply wanted searches, not any further information that would allow a group of searches to be linked with an individual, even if that individual as kept anonymous.

Danny may have more to say about this next week. He's at the SES San Jose conference this week and very busy with that, but he sent me some notes from a brief review of the AOL move to give perspective here as he sees it.

Postscript From Danny: Just a few quick thoughts and updates in the short time I have between sessions.

AOL: Dooooooh! from John Battelle and AOL apologizes for release of user search data from News.com have AOL apologizing for the release, now said to be data involving about 658,000 individuals from March through May of this year. AOL says the release of the data wasn't properly vetted for privacy issues and that the release intentions were innocent.

I believe that. Make no mistake, this was a big screw up. The researchers providing the data didn't think hard enough about how making it possible to build a profile of individuals, even if they were given anonymous names, might then make it possible to determine who those people are if they revealed enough information in their searches.

In addition, it's going to be very difficult for some law enforcement agency not to want to subpoena AOL for actual user names when they read about things that suggest a murder is being planned or may have happened, as covered above. I'm not saying they'll get it, but I think it's almost inevitable that someone will try. That will set off further privacy fireworks.

But yes, the original intention was innocent. I got an email about the research site last week (and with my traveling all last week, simply did not have a chance to check it out). Here's what a researcher involved with it emailed me:

Over the last few years I have witnessed a divide developing within Information Retrieval research - between the haves and have-nots. The ‘haves' are the companies like Google, Yahoo, MSN, and ourselves, with lots of resources and data. The ‘have-nots' are people without those resources such as academic researchers and smart guys at small companies. We want to be able to help anyone work on great ideas by giving them the data and infrastructure they need.

So we started building data sets and made them available for everyone to test their ideas with. Each data set features a dynamic view, which allows you to inspect the data without having to download it. We also built some APIs for news, video, audio and podcasts, which will save people time from having to do that themselves. We have tried to stay away from interfaces like web search as those are already around.

There's nothing evil in that. In fact, there's much to appreciate, intention-wise.

We all use search engines so much, and they are so important in our daily lives, yet they remain one of the most poorly researched media venues out there. Yes, we're getting new labs like the one from Yahoo at UC Berkeley. But most search behavior studies outside of the search engines have depended on ancient search logs from places like Excite from back in 2001 or so. Newer studies, if the search engines are doing them, simply don't come out often. So the intention to promote learning with this release was innocent, if not honorable. The execution was poor and inexcusable.

This is the second major milestone in raising awareness of search privacy issues this year. The first was the Department of Justice action, which rightly focused on whether we need more safeguards over what governments can request. Today's upset highlights the protections that are needed again corporate releases of data.

The good news is that perhaps it will spur better protections even more. Microsoft, Google & Others Call For Unified Federal Privacy Protection covers how the major search engines recently asked for better legal protections from the government. But perhaps the search industry itself will move forward to develop better privacy standards. I've hoped recently for some type of Search Privacy Bill Of Rights. Since I doubt the government will act quickly, perhaps the industry will go faster before a third incident causes searchers to completely lose faith in them.

AOL's Jason Calacanis, who runs Netscape, is proposing that AOL not keep search records at all. That might sound like a nice idea, but it's not practical. To not keep records raises issues with click fraud, plus with internal tracking to determine how to improve a search engine itself in how it responds and feeds queries. Putting better limits on how long data is kept might help, as might developing ways to somehow remove personally identifiable information that might get into search records.

Then again, Ixquick recently tried a PR push on how it doesn't keep records. Perhaps that's going to be a way for some players to win new users. Just make sure you also use some tool like Anonymizer to keep your ISP from logging your actions. Otherwise, your data is still out there and being recorded in another way.

For more on search privacy issues, here's a big giant list of recent posts:

Posted by Barry Schwartz at 10:52 AM | Permalink

July 28, 2006

Google Hands Over Email In Hate Case

Feds Retrieve Google Records after Gmail Used for Hate Speech from eWeek covers how the US FBI asked for and was given an email and some session information from someone accused of sending a threatening letter to the NAACP.

Posted by Danny Sullivan at 6:14 AM | Permalink

July 10, 2006

Judge Orders Google To Disclose Advertiser's Information

Out-Law reports that Google was ordered by Justice Rimer to hand over the information on an advertiser to Helen Grant for copyright infringement. Helen Grant "complained that a Google advert led to a service which she claimed violated her copyright in a forthcoming book." A search brought up a site named Realityunlocked.com, "which offered a free download of an earlier draft of the book, and that the site violated the Trust's copyright." Google asked Grant to take the issue to court, this way Google does not have to worry about the privacy issues with handing over the information.

Posted by Barry Schwartz at 8:33 AM | Permalink

June 26, 2006

Follow-Up: School Couldn't Reach Google Until Injunction Filed

Catawba County Schools in North Carolina obtained an injunction to remove private material from Google because it had no luck getting action from the search engine after trying other routes, the district tells me. The school district also stressed that it didn't claim that Google had somehow hacked into its servers. Here's what Catawba County School's chief technology officer Judith Ray emailed me about the situation:

We asserted that Google had somehow bypassed our login information, not that they had hacked their way into the system. Hacking, to me assumes malicious intent and we never intended to imply that Google was doing anything other than spidering all the web sites available.

There is also miscommunication about "all users" being required to log in. The DocuShare server is a repository for both public and private information with logins being required for users who are authorized to view the restricted information. There are hundreds of pages of information that we share from DocuShare with users around the state. These are completely open and are not supposed to [be] password protected.

We did troubleshoot this situation by searching for the students' information at Yahoo, Dogpile, and AltaVista. We did not find any information on these three search engine returns and we attempted the searches over a three-day period.

We acted so aggressively with Google because, until the media got involved, we could not get beyond an operator at Google. We could not get operators to connect us with technical support, the legal department, or to anyone higher up in the organization. We were only given an email address to which we could submit a complain - which we did but got no response. Google has a link to submit an emergency request [see here] but on both Thursday and Friday of last week, the link took you to a dead page. Only when the news media submitted its own inquiry to Google did we get a call regarding the situation. And [Google] has been most helpful in working through this situation with us.

Of course, none of us who are employed with Catawba County Schools at the current time were involved when Xerox set up this server. We are trying to ascertain if the server was incorrectly setup/protected or if the appropriate include meta tags or strings were not included.

Google Blamed For Indexing Student Test Scores & Social Security Numbers from us earlier has more background on the injunction plus how I was finding pages from what the district said was a password protected area to still be available through Yahoo. As clarified above, some of these pages indeed didn't require a login to view.

Our story originally was headlined "Google Blamed For Hacking & Indexing Students Test Scores & Social Security Numbers" and said in one part, "the school [district] blames Google for some how breaking into a password protected area and indexing the content."

As stated above, the school district itself never appears to have said anything about being hacked, only that Google somehow got into information it believed was password protected, as it says on the home page of the district site:

We do not know how Google was able to access the secure, password-protected site. Once Google does access a site, it places a copy of the data on its own server. We immediately called and emailed Google, requesting the urgent removal of the link and site data. We have eliminated the link from our end and it appears that as of Friday night, June 23, 2006, Google eliminated the site from their end.

The hacking reference seems to come from the "Google 'hacked our website'" story at The Inquirer, which we linked to in our original story. While the headline says "hacked" in quotes, the story itself doesn't have anyone from the school district saying this.

Digg also has a School claimed google hacked it's private servers and then posted that data article. Again, the school district isn't alleging hacking, only that Google somehow got into information it believed was restricted. How that happened is still being investigated.

As for the reference to Xerox in the school district's explanation, in doing some investigating in our original piece, I noted that the server seemed to be managed by Xerox and shared by other companies as well, with material for those companies appearing to be hosted on the school district's domain. As noted, the school district doesn't know why this was happening, and it remains something they are looking at.

Finally, Google's had problems with the automated page removal tool before, though not that it was down but instead allowing people to remove pages from sites they didn't own. More on that in our 2004 story, Google Confirms Automated Page Removal Bug.

Posted by Danny Sullivan at 1:35 PM | Permalink

Google Blamed For Indexing Student Test Scores & Social Security Numbers

Google "hacked our website" from The Inquirer points to Blame game from the Hickory Record, a story about how the Catawba County Schools in North Carolina has gained a temporary injunction for "Google to remove any information pertaining to Catawba County Schools Board of Education from its server and index and alleges conversion and trespass against the corporation." The school blames Google for some how getting into a password protected area and indexing the content.

Let me make this clear, Google cannot submit forms or type in usernames and passwords. Someone at the school must of left an opening for Google. The security hole came from possibly someone publishing the content publicly, somehow, or by letting down the security or by posting a hyper-linked URL with an embedded password in the URL.

I agree, Google should remove this sensitive information, which they did on Friday after the judge issued the temporary injunction. But Google should not be blamed for this.

Postscript From Danny: As Barry notes, this isn't a case of Google deserving blame. It cannot guess at a protected server's usernames or passwords, nor is it configured to try and hack its way in. If this information got into Google, that's almost certainly because it was left unprotected somehow despite the school's "very secure site."

Since the school says all personal information has now been removed and is protected, I'll explain more at what I guess happened.

The story mentions that somehow, information from the site's supposedly protected DocuShare server got onto the web. OK, where is that server? The story doesn't say, but this search at over at Yahoo gives the likely location:

docushare catawba

Fifth down is this:

DocuShare Authorization Error Not Authorized. You are currently listed as Guest, which means you are not logged in. ... Password: Domain: DocuShare Catawba County. Copyright © 1996-2003 Xerox Corporation ... docucentre.catawba.k12.nc.us/docushare/dsweb/View/Collection-1546 - 6k - Cached - More from this site - Save

That shows you that Yahoo tried to access a protected page on the DocuShare server at docucentre.catawba.k12.nc.us. Is this the secure server that Google somehow managed to penetrate? Probably, given that this search shows nothing at Google now:

site:docucentre.catawba.k12.nc.us

That search comes up with no matches. That's probably because Google responded to the complaint last Friday to remove all pages from this domain. But since no one contacted Yahoo, there's a good chance pages from the domain still show over there. And in fact, that search at Yahoo currently shows 13,500 matches.

Are any of these the pages the ones with sensitive information? I did some searches that I felt should bring up whatever the page was that Google was finding and had no luck. This means:

  • Yahoo didn't have it, because it didn't crawl as deep
  • Yahoo didn't have it, because Google really did somehow manage to get pass a password barrier
  • Yahoo didn't have it, because I'm not guessing at the right words in the document

Yahoo clear has some information that the school district itself says:

This site was a DocuShare password-protected site that required all users to log-in

No, not all users had to log-in. If that was the case, you wouldn't see any cached documents at all, such as this one. Clearly, some content was accessible without being logged in -- which makes it possible that some content wasn't properly placed behind password protection.

Postscript 2: See our follow-up, Follow-Up: School Couldn't Reach Google Until Injunction Filed

Posted by Barry Schwartz at 8:51 AM | Permalink

June 22, 2006

Google Updates Toolbar Privacy Policy

It appears to me that Google updated the Google Toolbar Privacy Policy yesterday. I know the dates do not reflect that on the page, but if you take a look at the current version and compare it to the cached version from Jun 16, 2006 you will notice a lot of changes. Below are some of the larger changes to the privacy policy.

+ Removed a bullet that read;

We do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how some web sites embed personal information in web requests, click here.

+ Added/Changed Significantly the following bullets;

(1) Toolbar Features that give you access to other Google services such as Blogger and Gmail are subject to the separate Privacy Policies of those products. Features that require use of a Google Account, like Bookmarks, store information with your Account as explained in the main Google Privacy Policy. Other features, like SMS This, that let you transmit data from the Toolbar may log that data transmission, as explained in the FAQ. (2) Third party site custom buttons send information such as search queries to sites that are not operated by Google or covered by Google's Privacy Policy. (3) If you have Google Toolbar Version 4.0 or above, your copy of Google Toolbar includes a unique application number. When you install Google Toolbar, this number and a message indicating whether the installation succeeded are sent back to Google. Also, when Google Toolbar automatically checks to see if a new version is available, the current version number and the unique application number are sent to Google. The unique application number is required for Google Toolbar to work and cannot be disabled. (4) Except for information sent through Toolbar for use with a separate Account-based service such as Gmail, we do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how this may happen, click here.

Those are the changes I noticed.

Posted by Barry Schwartz at 9:03 AM | Permalink

June 20, 2006

Microsoft, Google & Others Call For Unified Federal Privacy Protection

Microsoft bravely took part in the search privacy panel we did at our SES New York show earlier this year (coverage here and here), saying it would welcome better US federal protections on privacy issues. Why? It would let Microsoft and the searchers it serves know exactly what data government agencies could and could not have. Now Microsoft, along with Google and other tech companies, are pushing to make this happen.

Calling for federal consumer privacy protection over at the Official Google Blog talks about Google getting behind the organized effort and points to a statement letter (PDF) asking for a unified federal approach to consumer privacy. The effort is by the Consumer Privacy Legislative Forum, part of the Center For Democracy & Technology.

Companies signing the statement are:

  • Kodak
  • eBay
  • Eli Lilly
  • Google
  • Hewitt and Associates
  • Hewlett-Packard
  • Intel
  • Microsoft
  • Oracle
  • Procter & Gamble
  • Sun
  • Symantec

Posted by Danny Sullivan at 11:16 AM | Permalink

June 5, 2006

Windows Live Mail's Active Search: Gmail-Like Contextual Ads Next To Your Mail

Two years ago, Gmail launched with the idea of showing ads contextually based on your email. Soon after, the shit hit the fan, with one California state senator even backing a special anti-Gmail law that failed to pass. Fast forward to last week, with Microsoft rolling out Active Search within Windows Live Mail. Just like Gmail, it will deliver ads based on what you're reading. Unlike Gmail, there's been no privacy freakout that I've seen.

Microsoft's blog post on the new service highlights privacy protections meant to make you feel better. Heck, I was never that worried about Gmail. We'll see if places like EPIC feel they need to maintain a new FAQ about the Microsoft service similar to what they started (but don't seem to still maintain) for Gmail.

Microsoft Adds Contextual Ads Alongside Desktop E-mail from ClickZ has more on the new service.

Posted by Danny Sullivan at 4:05 PM | Permalink

UK Journalists Boycott Yahoo Services

America's Network reports that journalists in the UK are set to boycott Yahoo's services and products. The boycott is in protest of how Yahoo has handled some matters in China, such as allegations that the yahoo sent information about journalists to the Chinese authorities.

Posted by Barry Schwartz at 11:17 AM | Permalink

June 2, 2006

DOJ Asks Microsoft, AOL And Google To Keep Records

Last week during meetings with executives, Attorney General Alberto Gonzales asked several Internet companies to retain records for aiding in their prosecution efforts of terrorists and child predators. They requested lists of emails sent and received and web search information be kept for a reasonable length of time. The content of emails aren't part of this request, since the proper legal channels through which such information can be sought is by subpoena only.

Posted by Detlev Johnson at 8:44 AM | Permalink

May 25, 2006

Google Works With Brazil To Shut Down Orkut Communities

The Associated Press reports that Google has finally agreed to pull the plug on some communities within Orkut, Google's social networking software. Google has specifically agreed to shut down any community that violates Orkut's terms of service. This includes "any illegal or unauthorized purpose" such as;

+ Drug Pushers Using Orkut Arrested In Brazil + Brazil Asks Google To Help Orkut To Stop Organizing Organized Crime + Google & Brazil Fight Over Orkut User Data Rights + Google Faces Criminal Charges For Child Porn & Racial Material

About time I guess.

Posted by Barry Schwartz at 8:24 AM | Permalink

May 3, 2006

Google & Brazil Fight Over Orkut User Data Rights

AmericasNetwork.com reports that Google and Brazil are at it again, in terms of Google's Orkut, social networking product. Google appeared in front of the Chamber of Deputies' Human Rights Committee to protect Orkut users data from Brazilian authorities. Brazil wants the data to help prevent crimes, such as a recent fight Brazilian police broke up between "rival football fans," which was organized on Orkut.

Google has been asked before by Brazil for data. We are also aware of some drug pushers being arrested in connection with using Orkut.

Posted by Barry Schwartz at 8:58 AM | Permalink

April 14, 2006

Terrorists & Extremists Worry About Their Search Privacy

Worried that governments might spy on you through search engines? So are terrorist and extremist groups. Terrorists' Web Chatter Shows Concern About Internet Privacy from the Washington Post covers how one extremist web site warns against using Google and the Google Toolbar (which the post calls a relatively new product. It's been around since at least 2001).

Posted by Danny Sullivan at 7:41 AM | Permalink

April 10, 2006

Privacy Concerns Over Free Google Wi-Fi & Plans To Expand To New Cities

Poor old Google. No matter what they do, there's always some preexisting privacy issue they suddenly get blamed for. This time it's over the plans to support free wi-fi access in San Francisco with ads and how that means people will be tracked across the city.

Wi-Fi plan stirs Big Brother concerns from the San Francisco Chronicle looks at the issue. Google says location data would be deleted after 180 days. Privacy advocates worry that government officials could demand this "treasure trove" of data to track people.

To use the Google service, you'd have to log into your Google account. Voila! That would mean Google knows who and where you are, since the wi-fi access point you tap into will have a known geographic location. Of course, you use the paid version from Earthlink, you give them a credit card, log into your Earthlink account. Voila! Exactly the same issue.

In fact, any fee-based wi-fi service you've used knows who you are. These services have existed for years and just like Google's service, know what locations you're logging in from.

I don't know how long they keep this data, but that's also because I've not seen any articles on the topic -- just like I seldom see articles about ISP having records of what you search for. Instead, it's Google and search engines in general that get the focus. Protecting Your Search Privacy: A Flowchart To Tracks You Leave Behind has more on this.

Don't get me wrong. People should definitely be concerned about such issues, and it would be great to have better laws to protect us. But they need to consider the ISP angle, as well.

Meanwhile, while Google said back in October that it had no plans to do free WiFi outside San Francisco, Om Malik points out that Earthlink's CEO says the two companies are looking to do a second city now.

EarthLink, Google discuss bid for second muni network from Dow Jones has more on that, along with Google reiterating that it doesn't plan to expand beyond the San Francisco Bay Area.

Posted by Danny Sullivan at 8:47 AM | Permalink

April 3, 2006

Deleted Gmail Account? Kiss Your Email Goodbye!

Worried that the government's going to force Google to hand over your email? Fret not. Just delete your account. Do that, and it's gone forever, as some people are finding out.

Wait a minute! Wasn't there all that controversy about how even if you delete your mail, Google still keeps copies of it because of multiple backups? Well, it looks like those backups might not be as foolproof as they sound.

Last month, Google Blogoscoped featured the sad story of someone whose account had disappeared, which ZDNet UK later picked up on. Now in a follow up story from ZDNet, Google denies fault over Mail problems, Google explains that accounts have been deleted in only a few cases and that it is not responsible because these were cases where the users' passwords were given out to others.

For the record, Google does warn that if you delete an account:

Once you close your Gmail account, you can't reactivate it, and you won't be able to retrieve any messages. After a certain period of time, Gmail recycles your username, so we can't guarantee that it will be available if you decide to open another Gmail account.

But one who may (or may not) have been a victim of a hacked account fairly asks:

Even if someone did get my password somehow, shouldn't the original creator of the account be sent some sort of confirmation before actually going ahead with it? Gmail should pick up on that and fix this hole, otherwise it'll be chaos.

Assuming the deletion is spotted fast enough, you'd think the data wouldn't immediately disappear. After all, the Google FAQ on data retention says (bold parts are Google's own):

Some news stories have suggested that Google intends to keep copies of users' email messages even after they've deleted them, or closed their accounts. This is simply not true. Google keeps multiple backup copies of users' emails so that we can recover messages and restore accounts in case of errors or system failure. Even if a message has been deleted or an account is no longer active, messages may remain on our backup systems for some period of time. This is standard practice in the email industry, which Gmail and other major webmail services follow in order to provide a reliable service for users. We will make reasonable efforts to remove deleted information from our systems as quickly as is practical.

So fair to say, if your account was deleted and you discovered this fast enough -- I'd say within few days -- it seems like it could be restored off one of those multiple backup copies specifically retained for this type of situation.

Meanwhile, Google's Gmail fails to hit the spot from Bloomberg looks at how Gmail has far fewer users in the US than Yahoo, AOL and MSN. Of course, Gmail still remains a closed service. Yes, it's much easier for people to get accounts these days -- but I'd say having 7 million users despite the barrier Google throws up is a success, rather than failure. But the figures show a slowing in take-up despite it being easier to get in. Some are said to find the interface offputting.

Posted by Danny Sullivan at 10:23 AM | Permalink

March 30, 2006

Justice Department Subpoenas Data From 34 Others Companies

Google, Yahoo, MSN and AOL are not the only ones subpoenaed to give over their data to the government. InformationWeek reports 34 other companies were also asked to hand over the goods. The list includes Internet service providers, search companies, and security software firms. And Google, Yahoo, MSN and AOL thought they were special. :)

Here is the full list: 711Net (Mayberry USA), American Family Online, AOL, ATT, Authentium, Bell South, Cable Vision, Charter Communications, Comcast Cable Company, Computer Associates, ContentWatch, Cox Communications, EarthLink, Google, Internet4Families, LookSmart, McAfee, MSN, Qwest, RuleSpace, S4F (Advance Internet Management), SafeBrowse, SBC Communications, Secure Computing Corp., Security Software Systems, SoftForYou, Solid Oak Software, Surf Control, Symantec, Time Warner, Tucows (Mayberry USA), United Online, Verizon, and Yahoo.

For more background on this debate, read here.

Posted by Barry Schwartz at 8:29 AM | Permalink

March 24, 2006

Do Aerial Maps Violate Our Privacy?

These Maps Are Nice to Look at, but Not Smart from the LA Times revisits the entire "do aerial maps show too much" issue. Past stories on this topic have tended to look at how various countries are concerned that sensitive areas are displayed. This reporter takes a different angle, about whether maps show too much about our own homes.

So do you have a privacy right for your house not to be displayed? My initial reaction was no. Airplanes, helicopters and satellites fly overhead all the time taking pictures. They've been doing this since before such images where shown through products from Google and Microsoft. The data's already been out there, accessible in other ways, for ages. Heck, pick any car chase in LA and people know how the helicopters go up, broadcasting "private" images from above to anyone. Aerial privacy -- c'mon!

Then again, perhaps Google and Microsoft should consider allowing people to block out their homes, if they can prove they are the owners. If the White House and the US Capitol get to be blanked out, why not give everyone the right as good customer relations?

Posted by Danny Sullivan at 12:06 PM | Permalink

March 22, 2006

Google Doesn't Have To Hand Over Search Logs To Justice Department

Catching up on some important news from last week, the judge in the case of the US Department Of Justice versus Google has ruled that Google does NOT have to provide the DOJ with query logs. Google calls it a victory, and I agree.

Let's recap:

  • Last year, the DOJ demanded that Google handover two month's worth of query data, from June 1 through July 31, 2005. That would have been billions of queries in total. Just put them in an "electronic file," Google was told. Then find a terabyte USB key big enough to hold this monstrous text file, so that I guess the DOJ could open it up in WordPad on the special computer used to process Bill Gates's taxes. Maybe that has enough memory to load the file :)  
  • The DOJ backed off the original request, saying it wanted only on week's worth of data. "Only a week" still would have put the number of queries in the billion plus range.  
  • In court last week, the DOJ declared that it now only needed 5,000 random queries in total. Got it? Originally it needed billions of queries and went to court to force Google's hand, then it decides only 5,000 were necessary.

The judge decided against giving the DOJ any search data at all. Why? From my reading of the ruling (PDF format), the judge found that the possible concerns over privacy outweighed the concerns that the DOJ needed to have Google's data in addition to data it already obtained from other search engines or could obtain through other options.

The judge noted that Google itself warns users that government actions might require it to hand over private data. Still, the judge wrote:

The expectation of privacy by some Google users may not be reasonable, but may nonetheless have an appreciable impact on the way in which Google is perceived, and consequently the frequency in which users use Google. Such an expectation does not rise to the level of an absolute privilege, but does indicate that there is a potential burden as to Google's loss of goodwill if Google is forced to disclose search queries to the Government.

But the government didn't want private data, right? They only wanted queries, not the other log information that might link the queries personally with anyone.

That's not entirely correct. "Private Searches Versus Personally Identifiable Searches" is my past article that explains how all searches are private, at least in the minds of many searching. Moreover, some of these private searches might contain information to somewhat link them back to an individual. True, the searches can't be absolutely, positively identified back to an individual. However, they still remain "private" in nature.

The judge clearly was concerned about this, enough so to ultimately ruled against the query log handover:

Thus, while a user's search query reading "[user name] stanford glee club" may not raise serious privacy concerns, a user's search for "[user name] third trimester abortion san jose," may raise certain privacy issues as of yet unaddressed by the parties' papers. This concern, combined with the prevalence of Internet searches for sexually explicit material (Supp. Stark Decl. ¶4) -- generally not information that anyone wishes to reveal publicly -- gives this Court pause as to whether the search queries themselves may constitute potentially sensitive information.

Google does have to handover 50,000 random URLs from its index, which doesn't impact the privacy of anyone. Interestingly, the ruling does put Google in an odd position. Now it has a court backing up the idea that query logs are private. So should it still be publishing query log data through tools like those provided to AdWords advertisers? I showed earlier how I could use that tool to find things queries containing social security numbers. Perhaps the company might find itself the target of a different suit down the line, by someone claiming their privacy was violated through exposure like this.

I think that's unlikely, but it's worth noting. Overall, I still am glad to have these types of tools (other search engines offer them as well). I think there's a difference between the government overreaching to ask for billions of queries versus advertisers doing more focused research on searching patterns. And heck, the government could have just used the advertiser tools themselves.

For background on the case, see these past articles from us:

Posted by Danny Sullivan at 3:08 PM | Permalink

March 17, 2006

Judge Requires Google to Give Gmail Emails to Courts

News.com reports that a judge in San Francisco is requiring Google to hand over all the emails of a specific gmail user, even the deleted emails. Since Google stores deleted emails for an undisclosed amount of time, the number of emails that can be used against the plaintiffs, AmeriDebt and founder Andris Pukke, could be "tens of thousands." The case is about a credit counseling company that failed to use the customer's money to pay the creditors.

Posted by Barry Schwartz at 10:14 AM | Permalink

March 14, 2006

Judge: Google Must Give Up Some Data To Department Of Justice

Judge to Order Google to Give Up Some Data from the Associated Press covers the news that Google will be required to hand over some information requested by the US Department Of Justice. The DOJ has now asked for a much smaller set of data than it originally sought. It wants 50,000 URLs selected randomly from the Google index and 5,000 random search requests.

Geez, if that's all you need, guess it's confirmed you went overkill on the first request, eh? And what a nice spend of taxpayers money to contest this. I can think of better ways to get 5,000 random URLs out of the Google index and 5,000 search requests from other sources.

The judge in the case has said he intends to have some data, though what exactly remains to be determined. A final ruling will come "very soon" or "very quickly," the judge said, according to other accounts below:

  • Judge indicates Google must turn over some data, San Jose Mercury News, covering how Google conceded the new request is less burdensome on it plus more on the judge's concern that he wants to ensure no private data is released.  
  • Judge to help feds against Google, News.com, covering Google's warning that it "could face hundreds of university professors [saying] 'I've got a study I'd like you to conduct'."

For background on the case, see these past articles from us:

Postscript: Judge will require Google to turn over some documents from USA Today quotes the US DOJ attorney saying of the much pared down request:

"We could perform the study. The study would be substantially improved if we had the Google data."

In my some of my articles on this mess, I've noted one of the frightening things about the US government's original request was how ignorant it seemed to be of the way search engines operate. It had all the feel of "give us data," not "give us what we need." Now we're told that 5,000 random queries will "substantially" improve a study where other search engines were forced to hand over what seems to be millions of queries? Drop in the bucket, anyone? And more important, it again illustrates how far-reaching -- for no good reason -- the original request was.

Posted by Danny Sullivan at 3:00 PM | Permalink

March 13, 2006

Google Vs. US Justice Department Tomorrow At 9AM Pacific

Google is to face off with the U.S. Department of Justice tomorrow. MercuryNews.com reports "U.S. District Judge James Ware's courtroom is expected to be jammed for this heavyweight legal bout between the world's largest search engine and the federal government." As an FYI, they have postponed this event twice already, but tomorrow seems to be the day.

Posted by Barry Schwartz at 9:37 AM | Permalink

March 9, 2006

Google Subpoenaed to Reveal Identity of Person Who Posted at Google Video

According to the MercuryNews.com, Google has been subpoenaed by American Airlines to hand over the identity of the individual who posted a copyrighted training video. The video was titled "Flight Attendant, Upside Down" and was available for viewing at Google Video, but has now been removed. Cindy Cohn, legal director for the Electronic Frontier Foundation said that Google will most likely be required to "comply" with American Airlines request.

Posted by Barry Schwartz at 10:46 AM | Permalink

Yahoo's Yang Says It's More Important To Be In China Than Risk Of Not Participating

News.com reports that Yahoo co-founder Jerry Yang said, "It is more important for us to participate, not only for economic reasons, but to be able to help shape where the industry is going." Yang said that Yahoo has to balance the "risk of not participating" and overall, "we are seeing changes, on the whole, for the positive" in the Chinese market. Yang seems to take a different angle with his reasoning for operating in China, when compared to Barry Diller's Keynote where he commented that being in China is about being about to "stomach operating in a country" and that operating in China is more of a political decision then a business decision. Yang replied to giving up details of a Chinese dissident saying; "We feel horrible about that...We have no way of preventing that beforehand....If you want to do business there you have to comply." So can Yahoo! stomach it? Can Yahoo! influence the political gates?

Posted by Barry Schwartz at 9:31 AM | Permalink

March 8, 2006

Google Filings Against DOJ Request -- Including Declaration From Matt Cutts

I'm planning a deeper look at Google's rejection of the Department Of Justice search records request, which happened last week when I was on vacation. But a quick head's up. Many of you may have seen Google's blog post on the subject here, which in turn leads to their formal filing here (PDF). But that wasn't the only filing. Catching up on my feeds this morning, I saw that Gary compiled a full list of Google filings over here (PDF). My eyebrows shot-up when I saw Google's Matt Cutts had a long declaration as part of that package. I was planning to help spread the word more about this as part of an overall summary of what's in the various summaries, but Matt himself beat me to it with this blog post. So happy reading! I'll still be working on that general summary of everything hopefully for later this week.

NOTE: This was originally written on Feb. 22, but I've only just seen that it was left as a "draft" and never published. Sorry about that!

Posted by Danny Sullivan at 2:55 PM | Permalink

March 7, 2006

Justice Department vs. Google Court Date Pushed Back for Second Time

News.com reports that for the second time now, the courts have postponed the court date of Justice Department vs. Google. The case is regarding a subpoena sent by the government to Google, Yahoo, Microsoft and AOL, to give over information in regards to a anti-pornography law in a trial in Philadelphia. The case was delayed from March 13 to March 14 at 9:00 a.m (PT).

Posted by Barry Schwartz at 9:17 AM | Permalink

March 6, 2006

Google "Ashley Cole Gay" Results Suggestion Prompts Questions From His Solicitor

Google may face legal action over Ashley Cole searches over at Pink News covers how English footballer Ashley Cole might be upset with Google because of how its clustering technology is highlighting content about "ashley cole gay" in its search results. It's another example of Google's user interface experiments confusing people.

This screenshot shows what's at issue. Midway down, you'll see a section that says:

See results for: ashley cole gay

Independent Online Edition > Legal England footballer Ashley Cole is suing The Sun and the News of the World over claims that two Premiership players indulged in a "gay sex orgy". ... news.independent.co.uk/uk/legal/article348966.ece

Ashley Cole sues after gay rumours | Headlines | News | Gay.com UK Gay.com UK is the country's leading gay and lesbian lifestyle portal, providing an unrivalled combination of chat and news. uk.gay.com/headlines/9687

Ashley Cole files lawsuit over gay orgy story- from Pink News- all ... Ashley Cole files lawsuit over gay orgy story from PinkNews - all the latest gay news from the UK and beyond to the gay community. www.pinknews.co.uk/news/articles/2005-670.html

This is an example of the middle-of-the-page query refinement that Google's been testing over the past several months, as we wrote about back in August.

In particular, what seems to be happening is that Google is performing "clustering," a long-standing technique of grouping pages on a similar topic together. In other words, its sees there are lots of pages about "ashley cole" along with a subgroup of those on the topic of "ashley cole gay."

That there might be a subgroup like this isn't surprising. Cole is currently suing newspapers The Sun and The News Of The World over allegations they printed that he is gay. Those allegations have fueled discussion on the web, leading to a subgroup of pages on this topic.

Clusty provides a similar example of this. A search for ashley cole over there shows clustered topics along the left-hand side of the page including:

Cole's solicitor is reported by Pink News as wanting to know if it was editorially done by Google or based on search volume. Google gave no comment.

From where I sit, it almost certainly was NOT editorially done. Instead, it was probably based on a combination of search volume and actual pages on the web.

In other words, Google's probably seen a spike in queries for "ashley cole gay." It also can probably see there's a good chunk of pages out there on this topic.

For example, a search for the exact phrase "ashley cole" brings back 551,000 matching web pages. If I further refine that to "ashley cole" gay, I find there are 48,800 pages that use his name along with the word "gay" on them -- about 9 percent of all the exact phrase "Ashley Cole" pages out there.

It's important to remember that search counts can be very misleading. A large number of pages with his name and the word gay doesn't mean he is gay, only that many pages might be discussing the topic. It could also be his name is showing up on pages that use the word gay in reference to other people.

Our Fox News & Danger Of Citing Search Counts discussion at the Search Engine Watch Forums covers more about why you can't depend on counts to "prove" particular facts. But the large number of pages could cause Google -- just like Clusty -- to automatically decide that there's a "cluster" or "topic" related to those words.

Why bring up this particular topic when something like "ashley cole" cars comes up with more matches (60,100 of them)? That brings me back to search volume. If Google's noticing that there are a lot of queries on a particular subtopic (ashley cole gay) related to the main topic (ashley cole) plus a significant number of pages on that topic, that might cause this refinement to kick in.

Of course, turning to the Google AdWords Keyword Tool should help show this. It (and the more advanced tool here) can show the most popular terms related to the core term Ashley Cole. And those are?

  • ashley cole
  • ashley cole and cheryl tweedy
  • cheryl tweedy ashley cole
  • ashley cole pictures
  • ashley cole girlfriend

So where's "ashley cole gay" on the list? My guess is that the search data Google is showing is old, so that this term that may be rising in popularity isn't appearing.

The Google Zeitgeist is another place to check if this query might be gaining. However, Google's not updated non-US versions since last November. Even if it does, the lists there are subject to human review. Google might very well remove something if it's deemed not family friendly, just as it already removes many sexually-related queries.

In the end, I doubt Cole would have much success in suing Google over the listing, if indeed he decided he wanted to. There are definitely pages on the topic and almost certainly people looking for information about it.

Still, it would sure be nice as we wrote in our Google Losing Consistency As It Continues To Experiment With Results article back in August if Google made it clearer how and why certain things show up in its search results. It has a search results explanation page here, but that page doesn't cover the continued experimental displays that Google is doing and confusing people with.

Postscript: Hitwise has stats showing the growth the "gay" queries here, and Schmidt's Google Queried By Soccer Star's Lawyers from Forbes has none other than Google CEO Eric Schmidt putting out a statement saying the suggestion was automatically created based on query behavior. Cole's lawyer Graham Shear is satisfied with that explanation though wants to know more about the data behind it.

Graham, see the previous Hitwise link. The data's simple. Your client is in the news over the allegations. Lots of people interested in the case are almost certainly typing in his name to find out more, getting a lot of stuff not necessarily related to the allegations, so they are adding the word "gay" to narrow down the search results.

Posted by Danny Sullivan at 10:18 AM | Permalink

February 28, 2006

Department of Justice Rejects Google's Claims of Privacy Threat

Internet.com reports in Google Search Request Not a Privacy Threat that the Department of Justice has rejected Google's argument that handing over query data will be a privacy thread. The DOJ says this is because they are not asking for specific user data that can be associated back to any individual user.

"No individual user of Google, or of any other search engine, need fear that his or her personal identifying will be disclosed," according to the government brief.

Gary Price has a link to the full brief.

Posted by Barry Schwartz at 8:07 AM | Permalink

February 23, 2006

60% Oppose Search Engines Storing Search Behaviors

A study conducted by the University of Connecticut showed that 60% of users are opposed to search engines permanently storing their search behaviors. The study was in response to the US Government requesting search data from the search engines. The 23% of 800 Americans surveyed use a search engine more then once per day. They were split down the middle on the question of; should search engines provide search queries to the government, whereas 30% are in favor of the government monitoring search data. The study also shows that "only 13% of the public feel “extremely” or “very” confident that the search behavior collected by Internet companies will remain private." Read the full study here.

Posted by Barry Schwartz at 10:32 AM | Permalink

February 15, 2006

Man's Search Query to be Used in Court Case

The Associate Press reports on a story about Neil Entwistle searching at a search engine on the word "killing" days before "his wife and baby daughter were shot to death." A judge issued a search warrant which enabled the investigators to capture information from the defendant's computer. Part of that information includes search queries performed by the defendant days before the murders.

Postscript: See also CNN and CBS for more on the search records.

Posted by Barry Schwartz at 10:09 AM | Permalink

Boycott Google Desktop Search?

The Electronic Frontier Foundation has called for users to boycott Google's new Desktop Search 3 citing privacy concerns, including Google copying your personal data to its servers. Should you be concerned? I've taken a closer look at both the EFF's claims and what Google really does with your personal data in today's SearchDay article, Google Desktop Fears Overblown?.

Posted by Chris Sherman at 7:51 AM | Permalink

February 9, 2006

Google Desktop 3.0 Raises New Privacy Issues

Chris Sherman wrote about Google's new Desktop Search today. One of the new features Chris describes enables you to "use the Google Desktop to search across multiple computers." USA Today writes that this feature "raises privacy concerns." For this feature to work, Google has to copy your PC's files to Google's servers and then those files are sent back to the PCs. As noted in the USA Today article, "previous versions merely indexed files, without storing copies at Google."

The EFF is worried and warns not to use the feature. Should you be worried? If you are, you do not have to use that feature. But what about the unsuspecting user who doesn't fully understand that data is being stored of a period of time at Google? I can see a reason for concern there. Can we trust Google with our data?

The bottom-line is that we currently have a say, and we do not have to use Google Desktop or that feature in Google Desktop. Also be aware that this feature is NOT turned on by default. If enabled, data is kept only for 30 days if not accessed, Google says. Google provides more info here.

Posted by Barry Schwartz at 3:44 PM | Permalink

Data Privacy Bill Introduced, Not Well Thought Out "Bill would force Web sites to delete personal info" from News.com is an excellent write-up on a new bill introduced to the US Congress that would require web site owners of all types and sizes -- not just search engines -- to delete data. However the bill, which was sparked out of search privacy worries, might not correct problems it's aimed aim.

One concern the bill wants to address is this:

Certain information about Internet searches or website visits conducted from a particular computer can be obtained and stored by websites or search engines, and can be traced back to individual computer users.

To solve this, the bill requires that personal information be destroyed in an undefined "reasonable" period of time:

An owner of an Internet website shall destroy, within a reasonable period of time, any data containing personal information if the information is no longer necessary for the purpose for which it was collected or any other legitimate business purpose, or there are no pending requests or orders for access to such information pursuant to a court order.

What's personal?

The term "personal information" means information that allows a living person to be identified individually, including the following:

  • the first and last name of an individual
  • a home or physical address of an individual
  • date or place of birth
  • an email address
  • a telephone number
  • a Social Security number
  • a tax identification number
  • birth certificate number
  • passport number
  • driver?s license number
  • credit card number
  • bank card number
  • or any government-issued identification number

and does not include any record of aggregate data that does not permit the identification of particular persons.

None of this information was in the search records that were requested by the Department Of Justice from search engines. Yes, some of that information can be linked to search records, if people are personally registered with a search engine. But things like IP addresses and cookies are not covered and so wouldn't likely need to be deleted.

That's good, in many respects. IP addresses and cookies are commonly logged by web servers and produce data that is extremely useful in understanding things like conversion over time. Also, IP addresses and cookies don't necessarily personally identify someone, as I've explained. If this bill has required destruction of log data, it would have posed many nightmares for web site owners. Of course, they might argue that log analysis is a "legitimate" business need, perhaps allowing the data to be kept.

Overall, the bill seems pretty knee-jerk. For one, while individual web sites have to destroy data, it's not clear that third party mining services that are given the data have to do so. Rather than a well-thought out plan to fully address search privacy, as I hoped for, it seems almost as ill informed as the initial DOJ grab for data.

Want to comment or discuss? Please visit our Search Engine Watch Forums.

Posted by Danny Sullivan at 12:53 PM | Permalink

Yahoo Said To Have Given Details Of Another Chinese Dissident

Report: Yahoo helped jail another Chinese 'net dissident, Li Zhi from Boing Boing and Yahoo accused in jailing of 2nd China Internet user from Reuters covers Yahoo being accused of handing over evidence to the Chinese government about another activist. Yahoo came under fire last year for handing over information that caused a different activist to be jailed.

Posted by Danny Sullivan at 9:17 AM | Permalink

February 8, 2006

Google Introduces Marked Up Version Of Privacy Policy Changes

Google Brilliantly Updates Privacy Policy from Nathan at InsideGoogle notes that for Google Talk's privacy policy, you can now view a previous version where changes are highlighted. Nice. Other privacy policies at Google don't seem to have this yet. I'm guessing this will happen as each of them (such as toolbar or Gmail) are updated going forward. Most that I looked at were changed as part of a big privacy update Google did last October. Still, the Google personalized home page policy is dated as of January 2006, so it probably has changed since the October wave but has no guide to past versions. Prior versions of the general privacy policy can be found here.

Posted by Danny Sullivan at 12:34 PM | Permalink

February 6, 2006

Which Search Engines Log IP Addresses & Cookies -- And Why Care?

Last week I wrote how John Battelle followed up with Google to find out if they can link search data to IP addresses or cookies. Google said yes. I wrote that wasn't surprising. I covered back in 2003 how this is standard information any web server is likely to log, including servers at the major search engines. I also wrote last week that if Google is doing this, it was fair to assume all the major search engines are.

Rather than assume, News.com did an actual survey of this. Verbatim: Search firms surveyed on privacy has the rundown of AOL, Google, MSN and Yahoo (Ask Jeeves unfortunately was not included). Yes, they all log this information. AOL says they don't in one instance, but I'll debunk that later. First, let's go back to the bigger question of why suddenly people are asking about IP addresses and cookies.

Every time you go to a web site, you leave behind an IP address. This is like your internet telephone number, and it's possible (especially with the help of your ISP) to trace activity back to you. That 2003 article of mine, Search Privacy At Google & Other Search Engines, explains this in more detail.

Often, a web site will also assign you a cookie. This is simply a way for your browser to communicate to the web site that you've been there before (not you personally -- such as your name and address -- but you as in a particular web browser software like Internet Explorer or Firefox).

Cookies are better than IP addresses for tracking purposes, because your IP address will often change from internet surfing session to session. Your cookie stays the same, as long as you use the same browser on the same computer and don't delete it.

John's reader wanted to know if search queries at Google could be linked to an IP address or a cookie. Huh? What? Why care?

OK, let's say the government of BigBrother wants to know how many people are looking for something illegal, such as Widagra. Let's say Widagra is a drug legal in some countries but which BigBrother deems evil. If you are even remotely interested in this drug, BigBrother considers you a bad, bad person.

BigBrother wants to know all the people who might be looking for this drug via search engines, assuming that will lead them to the evildoers. So it tells the search engines to hand over a list of all IP addresses that are shown to have done a search for Widagra. The search engines hand over a list like this:

  • 195.93.21.100
  • 86.133.102.174
  • 144.132.1.30 ...and so on

OK, now the government of BigBrother knows all the people searching for Widagra. Well, not really. It knows a bunch of numbers, but it has to "resolve" or trace these numbers back to addresses from the various internet service providers. It does this, making the list look like this:

  • cache-los-ad04.proxy.aol.com
  • host66-133-102-174.range82-123.btcentralplus.com
  • CPE-144-132-1-30.vic.bigpond.net.au ...and so on

Now it has to figure out which internet service providers own these addresses using network records. That works out like this:

  • AOL
  • British Telecom
  • Telstra

Now it has to ask each provider to tell it who was on the internet from a particular IP address at particular time. In other words, take that AOL address (cache-los-ad04.proxy.aol.com). That will be recycled among various AOL users at different times per day. In some cases, people have "static" IP addresses that don't change. But most people using the web, to my knowledge, will have different addresses assigned at different time they access the web.

So, you can get a list of all those who did a search for a particular term using IP addresses IF:

  • A search engine provides the data
  • An ISP also provides the record of who used an IP address at a given time

If you don't get both of these things, you don't know who did the search. And if the IP address traces back to a public computer -- one at a workplace, in a school, a library, you still don't know exactly who was on the computer.

What about cookies? They just make it easier to see that the same browser software may have done something regardless IP address. For example, say you log in using AOL on your laptop computer, then use a wireless connection when traveling. You might leave two different IP addresses, like this:

  • cache-los-ad04.proxy.aol.com
  • host66-133-102-174.range82-123.btcentralplus.com

You're the same person, on the same computer, but you leave behind to completely different IP addresses. Someone just looking at IP addresses in a search engine's log records would think you are two completely different people.

Using cookies, each address would also have your browser's unique cookie identifier associated with it, like shown in bold below:

  • cache-los-ad04.proxy.aol.com e43UBsS4fNZzmDgj
  • host66-133-102-174.range82-123.btcentralplus.com e43UBsS4fNZzmDgj

Now even though the IP addresses are different, the cookies are the same -- so you know the same browser software made these requests.

Why's that useful? Back to BigBrother, say they scan the list of those searching for "widagra" and decide they'd like to profile individuals on that list further. They could ask to see all the searches done from a particular IP address. However, as I mentioned, since many IP addresses are reused, you aren't really seeing what one particular individual may have done.

Instead, they turn to cookies. They see that the cookied browser of "e43UBsS4fNZzmDgj" looked for "widagria," so they order up a list of all terms that browser did. They get back:

  • widagra
  • movement to overthrow BigBrother web site
  • widagra freedom campaign
  • how can we stop evil widagra users
  • i love president bigbrother
  • email valentine's day cards ...and so on

Some of those searches might help BigBrother decide this particular person is an evildoer. But then again, maybe not. Maybe they were researching the evils of widagra. Maybe the browser software was in a library, where different people used it.

Now that the basics of IP addresses and cookies are covered, we can come back to the survey that News.com did. John's reader -- and then News.com -- asked two key questions:

  • Given a list of search terms, can you produce a list of people who searched for that term, identified by IP address and/or cookie value?  
  • Given an IP address or cookie value, can you produce a list of the terms searched by the user of that IP address or cookie value?

In other words:

  • If someone gave you a "bad" search term, could you tell all the IP addresses or cookies associated with that search?  
  • If someone gave you a particular IP address or cookie, could you build a profile of search activity associated with it?

Note that the original questions say "people," which is NOT correct. No search engine can tell you the "people" who did a search from only the IP address or cookies they have. That information does not contain someone's name, address or other personally identifying information associated with it. As I explained earlier, you'd really only be able to do that if along with search records, you also got the ISPs to give up information.

The big exception is if REGISTERED USERS are involved. By registered users, I mean that you filled out a form and then logged into My Yahoo, Gmail or some other service where you personally make yourself known to a search engine. In these cases, they now have a much better idea that a person is involved and probably who that person is.

The answer to both questions is all the major search engines interviewed log IP addresses and cookies along with search data. OK, AOL said to one of the questions that it didn't keep info:

[News.com]: Given a list of search terms, can you produce a list of people who searched for that term, identified by IP address and/or cookie value?

[AOL]: No. Our systems are not configured to track individuals or groups of users who may have searched for a specific term or terms, and we would not comply with such a request.

Despite the response, I'm 99 percent certain AOL does indeed log IP addresses and cookies along with search data. Searching on AOL creates a page request with the search terms embedded in the page's URL. That request will be logged. If it's logged, it can be analyzed. In fact, AOL later says they can give you a list of searches that were done by a particular IP address or cookied browser. If you have that information, you have the opposite.

By the way, it's worth reminding that it's not just search engines that keep IP and cookie data associated with searches. News.com almost certainly logs IP addresses when you do a search there. John's blog almost certainly does the same, when you search at his blog. We log IPs, when you search on our blog. Heck, I'd be surprised if the EFF itself didn't have standard log data recording what people are searching on there.

How long data is kept is another issue. Privacy groups feel that if data is destroyed, it can't be abused. I've written earlier that I don't really want data destroyed, since what we search for is useful historical information -- and knowing that searches were done from a particular browser or an IP address is helpful in filtering and mining data. However, as I also wrote, it could be that IP addresses and cookies get replaced in a way that they retain some unique value while rendering them completely untraceable back to an ISP.

None of that replacement happening now to my knowledge, so data is building up. How long do each of the search engines keep it?

  • AOL: Personal search histories expire after 30 days, and backups are not kept. How long log data (IP, cookied info) is maintained is not covered.  
  • Google: No particular period for anything is given, which I read as nothing being destroyed.  
  • MSN: Data is deleted, but not specifics are provided  
  • Yahoo: No particular period for anything is given, which I read as nothing being destroyed.

Overall, I don't know that much more from this survey. Google and Yahoo had already said they kept data. MSN's deleting some, but I suspect log data is backed up and kept somewhere with no destruction policy in place. Same too, for AOL.

News.com also asked if any of the companies have handed over search data? Responses:

  • AOL: No comment  
  • Google: No comment (Gmail requests have been received)  
  • MSN: It has never had any criminal or civil requests for search history data  
  • Yahoo: No comment

MSN has learned a lesson from its failure to disclose properly last month in the Department Of Justice case. It was the only search engine that didn't dive for the cover of no comment and gave a clear and reassuring answer. The answer is probably the same for the other search engines, so why not just say so?

FAQ: When Google is not your friend from News.com is that publications look at what this survey means, which I came to after writing up my own thoughts on the survey. You'll see that covers issues such as IP addresses changing, how cookies are used and how a US law might -- or might not -- apply to protect search privacy.

More on search privacy issues from us, see these articles:

For more on the entire current fight between Google and the Department Of Justice, see these articles:

Want to comment on things discussed in this article? We have several Search Engine Watch Forum threads where everyone is welcome:

Posted by Danny Sullivan at 3:00 PM | Permalink

February 3, 2006

Google Has Right to Log the Text of Messages Sent Using their Send to SMS Feature

Nathan over at InsideGoogle mentions a post by Devin Reams who points to a portion of the Google Firefox Send to Phone FAQ: FAQ that says, "we [Google] might also log the text of the message you send, in order to investigate and correct technical problems with the service."

The wording in the privacy section of Google Toolbar 4 documentation reads,

If you send text through SMS using Send To feature of the Google Toolbar, Google logs the number and carrier the message is sent to, and in some cases may record the text sent for debugging purposes.

I don't have time at the moment to check, but I'll try to find out if other services like Vazu, Yahoo's Text-to-SMS service and some of the other web-to-SMS services have similar policies.

Postscript: You can check for yourself. Here's the privacy policy from Vazu and Yahoo (mobile devices).

Posted by Gary Price at 5:22 PM | Permalink

Google Subpoena Update: Judge Delays Hearing for Two Weeks

Declan McCullagh writes that the court hearing orginally scheduled for February 27th to determine if Google will have to turn over search records to the U.S. Department of Justice has been postponed for two weeks and will now take place on March 17th. U.S. District Judge James Ware provided no reason for the delay.

Ware also said that Google's response to the Justice Department is now due Feb. 17, and the government's reply is due on Feb. 24. Other organizations such as nonprofit groups, individuals and companies that have permission to file friend-of-the-court briefs have until Feb. 24 to do so. Prosecutors are requesting a "random sampling" of 1 million Internet addresses accessible through Google's popular search engine, and a random sampling of 1 million search queries submitted to Google over a one-week period.

More in the article: Judge postpones Google subpoena hearing

I've posted the court docket as of Thursday here and the actual court order delaying the hearing here (PDF).

Posted by Gary Price at 1:24 AM | Permalink

February 2, 2006

Oops, Specs for Dell Computers Found in Google Cache

The News.com story: How to evade Google search, reports that once again a company, in this case Dell, has learned the hard way that what's put on a public web server is open to crawling, caching, and discovery.

Specifications for future Dell notebooks were accessible via Google's search site before the content was pulled from a Dell file transfer protocol site and from Google's cache.

It's very likely, almost a given, that most of you know about keeping content from being crawled and/or cached using robots.txt or one of many other methods. If you don't or need a quick review, one of my favorite info compilations about robots.txt comes via SearchTools.com.

It's very possible tha this article will reach many people who have little to no idead about how crawlers operate and how to keep content out of Google.

The article would have been more useful if it stressed that this is a webmaster and web-wide issue and not a Google issue. Every webmaster who places content on publicly accessible servers should have a basic understanding of how web crawlers work and that many large engines (and even some verticals) cache content.

Google is the most widely used web engine but the webmaster who only focuses their attention on Google might not realize that the searcher who knows about cached content, and then goes looking for it, will know about many other web caches.

In other words, keeping content only out of Google doesn't mean it's not accessible elsewhere and off the web. SEO's know this to be true but I often wonder about others.

Postscript: I noticed that this News.com article about what the Dell notebook specs contained does point out (at the very end) that the material was also cached by Yahoo.

Posted by Gary Price at 1:16 AM | Permalink

January 27, 2006

Judge Sets Hearing Date in Google Subpoena Case

Declan McCullagh's: Court date set for Google lawsuit, says that U.S. District Judge James Ware has announced that a hearing regarding the subpoena asking Google to turn over search records to the U.S. Department of Justice will take place one month from today, February 27, 2006, at 9am in California Northern District Court in San Jose.

Ware also set a date of Feb. 6 for Google to file a legal brief with its arguments, and a Feb. 13 date for the Justice Department to submit its reply. Ware is no stranger to technology cases. He heard the Sex.com case in 2001, a spam lawsuit in 1998, and a legal spat between RealNetworks and Microsoft in 2004.

A brief bio of Judge Ware is available from the Federal Judicial Center.

I've posted a copy of the actual court filing with the schedule here (PDF). I've also placed a copy of the latest court docket (as of Thursday, expect updates) on the server.

You can find the full text of other key court documents filed to this point in this blog post.

Posted by Gary Price at 3:24 AM | Permalink

January 26, 2006

Scoble: Search Champs Talk to MSN VP about Data Turned Over to Feds

Scoble reports from MSN Search Champs about a session where the those in attendance "grilled" MSN Vice President Christopher Payne about what search data Microsoft did and did not hand over to the Department of Justice. As Danny pointed out from the outset, no personally identifiable search information was turned over by any of the engines. In his post, Scoble shares a bit more info and background.

Summarizing Scoble's post:

  • No IP addresses or identifying information was given over even though the government asked for more.
  • Robert doesn't say what else the feds wanted.
  • MS renegotiated to make sure no identifying info was turned over.
  • In the session, one attendee said MSN should work with Google and Yahoo and form a "unified front."
  • MSN Vice President Payne that MS needs to be "for more transparent about these issues."

Posted by Gary Price at 2:47 PM | Permalink

January 25, 2006

U.S. Senator Patrick Leahy Asks Attorney General for More Info On Web Search Subpeonas

News.com reports that Senator Patrick Leahy of Vermont has sent a letter to U.S. Attorney General, Alberto Gonzales, asking for more information about the subpoenas for search records from Google, Yahoo, MSN, and AOL.

In a two-page letter [PDF also available here] sent Tuesday to Attorney General Alberto Gonzales, Sen. Patrick Leahy of Vermont asked the department to outline the type of information it has requested, its reasons for the requests, the steps it is taking to safeguard any data obtained, and any plans to issue additional subpoenas in the future.

From the Leahy's letter where he askes Gonzales for several specific answers:

I am interested in learning more about the extent to which the Department of Justice is relying upon data mining of the Internet search queries made by law-abiding American citizens to support its efforts under the COPA and how the Department is addressing the privacy and civil liberties concerns raised by the collection, storage and use of such data.

Specifically, I ask for and would appreciate your responses to the following questions:

1. According to press reports, the Department of Justice issued subpoenas for records to Google, Inc, America Online, Inc., Microsoft Network, and Yahoo, Inc. (collective, the ?Internet Companies?) in connection with ongoing civil litigation involving the legality of the COPA. Please state whether any, or all, of the Department?s subpoenas to the Internet companies were issued in connection with this, or any other, civil or criminal litigation or investigation.

2. Please identify the type(s) of information and/or data that the Department requested in its subpoenas for records issued to the Internet companies -- including whether the Department requested, or obtained, any personal identifying information and/or data in connection with the subpoenas -- and state how the Department intends to use this commercial information and/or data.

3. Please state what, if any, safeguards are in place within the Department of Justice to protect the privacy of the millions of American people who conduct searches on the Internet in light of the Department?s requests for this commercial information and/or data?

4. Please state whether the Department will issue any additional subpoenas to the Internet Companies and, if so, state whether any such subpoenas will seek personally identifiable information.

5. Please provide any documentation that relates to, or supports, the answers to these questions.

A US Justice Department spokesman said that they will respond to Senator Leahy's questions.

He added: As for the privacy concerns raised by Leahy, "We've addressed that in our subpoenas and to the search engines," Miller said. "We weren't seeking information about the individuals, we were only seeking the search terms....We don't even want to know the names of the people."

You've got to wonder if Congressional hearings might be in the works if the responses Leahy receives aren't satisfactory?

For more on the Internet, privacy, keeping "objectionable" material from minors on the Internet and more, here's a collection of reports from the Congressional Research Service.

Posted by Gary Price at 5:13 PM | Permalink

Search Data Request Has Searchers Pondering Their Next Query

After Subpoenas, Internet Searches Give Some Pause from the New York Times is a nice, reflective piece on how people might be thinking more about how the searches they are doing might be perceived by others, in the wake of last week's US Department Of Justice demands for search data from Google. Also be sure to see the poll Gary blogged on earlier, where many people assume their searches are private and a good chunk want them to stay that way, though they are more willing to share in particular circumstances or in response to subpoenas.

Posted by Danny Sullivan at 9:26 AM | Permalink

January 24, 2006

New Poll Finds Web Users Want Google to Keep Data Private; Full Text Access to Report Also Available

Elinor Mills at News.com clues us into a poll conducted over the weekend and reported by Verne Kopytoff in the in the San Francisco Chronicle and Michael Bazeley in the San Jose Mercury News that shows 56% of those surveyed don't want Google handing over any info to the government.

From the SF Chronicle article: As part of the findings, 56 percent of respondents said they do not want Google to turn over any information to the government. More than three quarters of the respondents, or 77 percent, did not even know that Google collected information that personally identifies them. Google keeps records of IP addresses, which can be traced back to individual computers. In cases where the government is trying to prosecute a crime, according to the survey, the respondents were more open to Google sharing information. About 14 percent said that they were willing to give the government access in such cases, while 44 percent said that they were willing in only certain cases.

Mike Bazeley points out that many of those surveyed would stop using Google if they gave the government the data they requested.

From the Mercury News article: More than a third of the survey-takers -- 38 percent -- said they would stop using Google if the company ever turned over information about their searches to the government. The survey did not ask people for opinions about Yahoo, Microsoft or AOL.

The poll was made up of a random sample of 1,017 Internet users over the age of 18 and conducted by the Ponemon Institute [via email], a privacy research organization (aka think tank) group based in Michigan.

I'm interested to see if the search companies who handed over info to the feds (none of it with personally identifiable info as Danny clearly points out here) lose any market share and/or total number of searches in the future due to sharing data with the government.

Also worth a look (if you haven't done so already) is Danny's post: Private Searches Versus Personally Identifiable Searches; a statement from MSN along with plenty of reader comments on MSN Search's WebLog, a review of and links to the court filings, and some background reports on privacy, the Internet and related topics from the Congressional Research Service.

Postscript: Thank you to the The Ponemon Institute who have given us permission to post the the full text of the report containing the results of their recent poll (PDF).

Posted by Gary Price at 5:07 PM | Permalink

Google Not Installing Third Party Cookies -- It's Firefox Prefetching

John Battelle spotted a post from Chris Marino at Tumbling Duke that has the worrisome suggestion that Google is allowing third parties to set cookies based on searches people do. But I dropped an IM to Dave Naylor, who immediately spotted this being due to Firefox prefetching.

If you use Firefox, Google will automatically preload the pages showing in the top search results. They made this change back in March. As they warned back then:

With prefetching enabled, you may end up with cookies and web pages in your web browser's cache from web sites that you did not click on since prefetching happens automatically when you view Google search results pages. You can delete these files by clearing your browser's cache and cookies.

So in Chris's case, he writes about how he searched for cars, Amazon and Walmart and got cookies from Cars.com, Amazon.com and Walmart. He assumed this is all related to AdWords in some way.

AdWords isn't the issue. It's because for a search on cars, Cars.com was the first site listed and so that page was preloaded -- and that meant a cookie from Cars.com came with it. It's the same situation was true for Amazon and Walmart. in searches on their names.

Posted by Danny Sullivan at 12:40 PM | Permalink

January 23, 2006

Protecting Your Search Privacy: A Flowchart To Tracks You Leave Behind

Wired's "How to Foil Search Engine Snoops" is a nice guide to protecting your search privacy, but it doesn't really go far enough. In particular, anyone who assumes they've protected themselves by using an anonymizing tool is probably not eliminating the important ISP aspect. Meanwhile, laws being considered to force search companies to destroy data must consider the role of ISPs to fully provide the intended protection.

In this piece, I'll take you step-by-step about how your search privacy data gets exposed from all the way from your desktop to the sites you visit. Let me make some caveats before I begin.

Normally with stuff like this, I like to do a "Big Story With Answers To All The Questions" type of piece. That's what I tried to do back in 2003, the last time search privacy really came up as an issue. Much of what I wrote then is still applicable to the issues today, and I'll be drawing on those pieces. You may wish to read them as well:

I definitely don't have all the answers to all the privacy questions in this piece, especially as privacy issues have gotten more complex. But I wanted to make a start, perhaps the beginning of a living document or future article that will provide all the answers. I'd especially invite those with additional tips, observations and so on to contribute to a Search Engine Watch Forum discussion on this topics -- the link will be at the end of the article.

Onward to the search privacy flowchart. It's not an illustrated one in the traditional sense, but it should give you an idea of all the traces you leave behind when searching for something.

1. Search Privacy On Your Own Computer

In November, we wrote of a man convicted of killing his wife in part because authorities found he'd searched for "neck," "snap," "break" and "hold" on Google. But that information was not handed over by Google itself. Instead, it was found in traces left behind on the man's own computer.

Anything you do on the internet gets recorded on your own computer in various ways. Pages you've visited are stored in your computer's cache, and a history of the URLs you've seen and things you've searched for may also get stored in your browser.

Clearing Your Search History From Google And Other Search Engines from me in 2003 covers some of the ways to delete what you've looked for in Internet Explorer 5, much of which is applicable to Internet Explorer 6.

How do I delete the drop-down list of my past searches? over at Google looks to be a very comprehensive guide on clearing out any search history that appears in the search box on the Google home page.

That information is NOT saved at Google. Instead, it's recorded within your own browser. The Google page gives instructions for cleaning out IE, Firefox, Safari and other browsers. Also, these same instructions should work to clear out your search history at all search engine in one go, not just at Google.

Unfortunately, there are so many search toolbars out there that they might keep their own histories independently of your browser. Google's does, and the page above from Google has instructions on clearing that out. MSN has instructions on clearing its toolbar history here. Instructions for Yahoo are here. For other tools, a first stop is to check the help pages for them.

Now that you've cleared out saved searches, you've still got URL histories and saved pages you might need to clear. How to clear your browser's cache and cover your tracks on the Web looks to be a pretty good article to guide you on how to delete this type of material. It also points to a number of software tools to make life easier. There's also more tools here, here and here from Download.com.

Software may be the way people need to go, as search gets more and more embedded into everything. Running any desktop search tools? They may be storing information you want to delete. For example, Google's desktop search tool also stores all the pages you view on the web. When I last looked, deleting your browser cache did not destroy the data Google Desktop itself keeps.

Managed to wiped everything out either manually or with software? Now go wipe out your hard drive. That's because even if you delete files, people with the right tools and knowledge might still be able to bring back the data. Some of the tools mentioned above may be able to make this easier so that something you've deleted really stays deleted. But the most surefire way to do so would be to physically destroy your computer's hard drive, literally prying out the metal platter where the info is recorded and ideally breaking it up into multiple parts that would be disposed of in various places.

Back to reality, most people aren't going to do that. But I'm trying to underscore how difficult it is to absolutely protect your privacy from prying eyes right on your own computer.

For those worried that tips like cleaning search history from your desktop is helping potential wrongdoers, keep in mind that there are plenty of innocent reasons for wanting to clear search information. For example, a neighbor's older son had looked up porn on their computer. My neighbor could not figure out how to get rid of the pornographic search terms that kept appearing in the search drop down box that his younger daughter was seeing.

2. Search Privacy & Your ISP

The weakest link in protecting your search privacy is your ISP. Everything you do is going to flow out of your computer and through your ISP to a search engine. Your ISP will see the pages you are requesting and in all likelihood have some type of records of what you've done for a set period of time. Whatever deletions you do on your own computer -- plus whatever things you do to be anonymous with search engines -- these have no impact on your ISP. It sees all.

For example, Earthlink makes available a variety of tools to protect your surfing privacy on your desktop. But what's the policy on Earthlink's retention of data on sites you've visited through them? I don't know. Here's the Earthlink privacy policy. The closest relevant section is this:

EarthLink has security measures in place to protect the loss, misuse, and alteration of the information under our control. While we make every effort to ensure the integrity and security of our network and systems, we cannot guarantee that our security measures will prevent third-party "hackers" from illegally obtaining this information. We will never sell your information to a third party.

How long are records of what you've visited kept? Do these records exist at all? How might they be shared with others? Answers aren't provided.

Back in June, I wrote of a Reuters article (no longer at Reuters, but there's copy here) that cited one analyst saying that most ISPs don't keep data for longer than a month. In Europe, governments themselves apparently mandate a one to three year retention of data, according to a News.com article from last year. Ironically, while the current US government request for search data has at least one lawmaker considering whether search engines should destroy data, that News.com article says the US government seeks to force ISPs to keep data longer.

By the way, even if your ISP deletes data, you'd better make sure they are forcing companies that mine their data to do the same. Better Search Privacy Needs Addressing Overall from me covers how third party companies such as Hitwise take in ISP data as a way to track what people are doing on the internet.

3. Search Privacy & Your Search Engine

Visit a major search engine, and it keeps track of every request you make. It will also assign you a cookie, unless you reject these. That's easy enough to do, and the Wired article gives you some tips on that.

Rejecting cookies still leaves behind your internet address. My Search Privacy At Google & Other Search Engines article and the other one I've just posted, Private Searches Versus Personally Identifiable Searches explains this a bit more. Basically, it links your request back to your ISP and thus still back to you, if someone has access to your ISP.

The Wired article suggests using an anonymizing tool to avoid this. Anonymizer is a long-standing one. However, most anonymizing tools only prevent sites you visit from seeing your real internet address. They don't prevent your ISP from seeing where you are going.

I learned of the Tor anonymizing service through the Wired article. It's not clear to me whether that prevents the ISP tracing, as well, Talking with Dave Naylor, a search marketer who also runs his own ISP, your activity would be hidden from your ISP only if Tor keeps all information you send encrypted between your computer and the Tor servers you tap into.

Ethan Zuckerman (author of A technical guide to anonymous blogging - a very early draft) has a nice post about using Tor over here, but it doesn't seem to address the ISP question.

4. Search Privacy & Your Personalized Results

Let's flip things around and say you are NOT worried about visiting your favorite search engine and staying anonymous. In fact, you've decided to embrace the search history features they offer, which frankly can be really useful. Google's, for example, I find does a good job of improving my results based on pages I've visited.

Down the line, you might decide you want to get rid of some or all of your search history. At Google, it's easy. Here are the instructions. Ah, but even though you removed those items, they aren't necessarily deleted! Here's the Google privacy policy on personalized search. Notice that while your information is "removed from the service," it does NOT say that the information is destroyed entirely.

Over at Yahoo, you can clear your search history, though there's no help page I can point you at about this (go to My Search History, then use the Clear History link on the left-hand side). However, it's again not clear that this wipes out all the information entirely. The FAQ section on privacy says nothing about it, nor does the search privacy policy. And while Google's famously whipped for not destroying data, that Reuters article I mentioned above has Yahoo declining to say how long it keeps data. This suggests that Yahoo doesn't destroy data, either.

Using personalized search at Ask Jeeves? Here's how to delete information there, though it's not clear from the privacy policy whether that information is deleted in any other records that are kept.

How about A9? Here's how to delete your A9 search history. As with the others, the privacy policy makes no mention if that information is deleted in other places.

5. Search Privacy & Sites You Visit / Tracking Services

All the major search engines embed the search terms you used into the URL that appears in the address field of your browser. When you click on a listing, that URL is sent as "referrer" information to the web site you go to. That means what you searched on is sent to the web site you ultimately visit from a search engine. They're able to know the search terms you used plus your IP address.

Referrer information is precious data to web sites. It allows them to know exactly how people found them. As a search marketer, I'd hate to see this information go away. But it is a privacy issue to be aware of.

Many web sites make use of third party analytic services, such as ClickTracks, WebSideStory, WebTrends or Google Analytics. That means these services are almost like clearinghouses of search data. They see what many people are searching for -- and clicking on -- from all over the web through the data from thousands of clients using them. Potentially, they are just as rich a target for any government agency to mine as the search engines themselves.

To protect yourself, you want to ensure your browser doesn't pass along referral information. In Internet Explorer, I see no native way to do this. You'll have to turn to products like Norton Security or the tool I use and much prefer, ZoneAlarm. There are certainly other third party tools out there. For Firefox, there's at least one extension you can try.

In Conclusion: Securing Search Privacy Is Tricky

As you can see, ensuring your search privacy is tricky. The information you send is leaving traces in multiple places. The solution to ensuring privacy isn't going to be as easy as passing a law that targets Google, Yahoo and the others. Ideally, the entire lifecycle of a search beyond the computer desktop needs to be considered from ISP through to tracking services. Searchers themselves also need to consider what they do on their own computer desktops.

There's also an issue of what should be private. I wrote earlier today that most people probably think the conversations they have with search engines as being private. But to date, we don't have any protected searcher-search engine relationship as we do with attorney-client privilege or between clergy and worshipper. Perhaps that needs to be enshrined in some way. But then again, others may feel that going out on to the public web and using publicly accessible search engines entitles no one to an expectation of privacy, or perhaps a more limited one.

Certainly, we need to have a good debate and discussion. That's probably the good that's coming out of the Department Of Justice action. After years of worrying about privacy issues, the DOJ action is turning that worry into action about better protections that may need to be put into place.

Let me add that while I hate the sloppy manner in how the DOJ has acted in this particular case, I have no more interest in criminals using the internet for bad purposes than most people would. In specific circumstances, with the right legal oversight, I hope search or internet browsing data might be evidence that helps catch a criminal, just as I hope they'd be caught through legally approved wiretapping or other types of law enforcement monitoring.

What I don't want is a Big Brother state to be mining everything with the assumption we're all criminals, any more than I want all telephone calls to be monitored. Moreover, it's very, very easy to mistakenly assume from a search request that something wrong is happening, when it is not. Jon Swift takes a light-hearted look at this in his post today, but it's true. A search for "bombing the white house" doesn't mean someone's planning to do that. It may simply be that you're trying to find out about someone who may have attempted this.

Aside from the government issue, there's the concern that the search companies themselves might misuse data. That needs to be considered and improved guidelines or laws developed. Even better would be to see such moves as part of improved protection of consumer information of all types. The amount of data about what people personally are interested in and do seems easier to obtain from consumer research organizations right now than what search engines possibly might provide in the future. How about considering these both together, rather than separately, an idea that came up in a Newsfactor article on Google and consumer data in general last year.

For more the current issue between the Department Of Justice request for search data, please see these articles from us and others:

Want to comment on things discussed in this article? We have three Search Engine Watch Forum threads where everyone is welcome:

Postscript: Anonymizer tells me that if you are using only the IP hiding function in Anonymizer, then your ISP will see what you are doing. However, if you use the SSL encrypted "Surfing Security," then your ISP cannot see what you are doing. They're using a better metaphor for this now, calling it an "virtual tunnel" between you to the Anonymizer servers. Ah, but what records does Anonymizer itself keep? None, the company tells me:

The way that the technology is architected, it does not retain any information about users' requests so even if subpoenaed, no information can be supplied because -- simply -- they do not keep any of it. For example, they would not be able to share with anyone where a user is by IP address, or what sites they visited, or anything else, because even Anonymizer does not know. Additionally, the company provides software for use in instances where a privacy breech might have severe consequences -- even death in some cases (where the company protects freedom of speech in foreign countries, Anonymous tips, etc.). Anonymizer has never had a single breech since it began selling products and services in '97, due to its level of security. Trust is a key difference.

Posted by Danny Sullivan at 11:21 AM | Permalink

Private Searches Versus Personally Identifiable Searches

I've written that no private information was given by any of the major search engines that did respond to the Department Of Justice subpoena or request for search data. However, as people are discussing and debating the case more, they're realizing that there is some private information contained within searches themselves. And this is true. Even "anonymous" or "aggregate" search data has some private information.

Indeed, it can be argued that all queries are effectively private. Having said this, there's an important difference between private information and private information that can be actually linked to an individual with confidence. In this piece, I'll explain some of the concerns and differences.

Let me start with the suggestion that ALL queries made to a search engine are private, at least in the minds of those making them. I think it's reasonable to assume that most people doing searches assume they are having some type of confidential conversation with the search engines they use. They don't expect that what they enter into a search box is going to be broadcast to the world.

Those more educated about search engines will know this is a false assumption. Here's a list of the many ways search engines have broadcast what people are searching for to the world. Heck, just last month we were awash in press stories about the top searches of 2005, as each major search engine released popular query lists.

Most of those lists are sanitized, so that you never see things such as porn queries that happen. Moreover, the number of "live" displays, where you can see in real time what people seek, has diminished over the years. However, plenty of press accounts about Google still include the almost mandatory description of how visitors to Google offices around the world are entertained by seeing "live" queries displayed on the walls (see pictures here, here and an excellent one here).

Given all this, how can I say queries are private? Because back to my main point, most people are unaware that search queries are broadcast this way. And because of this, they'll reveal information to a search engine that they may not want the rest of the world to know, private information.

How about an example. Britney Spears remains a popular search topic, so a query like this:

britney spears

while private isn't going to cause privacy concerns for most people, if it is publicized. What's could be "wrong" or "incriminating" about looking for Britney? Heck, Yahoo's search term suggestion tool for advertisers tells me that 2,230,646 searches for her happened on the Yahoo network of advertising sites in December 2003.

Now how about this query:

britney spears nude

That's probably embarrassing to most people who did it. It's also the second most popular "britney" query that happened last month, with 92,255 searches. Despite that popularity, I'd wager that most people don't want the world to know they were looking for the pop star without her clothes on.

Those who did so can breathe a sigh of relief over the current fracas over search data being released to the Department Of Justice. That's because while this private query has been released, no personally identifiable information has been released with it. There's no way to link the query back to the person who did it.

In particular, you can imagine that for every query at Google or another search engine, you leave behind a record that's something like this:

www-az3.proxy.aol.com - 25/Dec/2005 10:16:22 - http://www.google.com/search?q=britney%20spears%20nude - 740674ce213e9d9 - lexluthor340@aol.com

My Search Privacy At Google & Other Search Engines article from 2003 explains more about what's in these records, but I'll do a short breakdown here of the bolded portions.

  1. Internet Address: The first part of that record (www-az3.proxy.aol.com) shows your internet address (in this case, it tells that someone connected through AOL). It doesn't reveal your name, address or anything personal about you. However, if someone were to contact AOL and they themselves checked their records for that period of time -- and you also had the Google records -- then someone might link you to the query.  
  2. Query Terms: The second part has the search words you did.  
  3. Cookie: The third part (740674ce213e9d9) is your cookie. That tells Google that the request came from a particular web browser that it has interacted with before. Again, it doesn't reveal your name or anything about you in particular. It's just a random set of numbers assigned to identify your browser.  
  4. User Account: The fourth part (lexluthor340@aol.com) represents the Google Account name you created, assuming you did create one and logged into Google to make some use of account-based services. It may link your email address to this query. If you make use of any transactional services with Google, then that might help link your actual physical location to the query.

That type of information in various ways is logged by ALL the major search engines. However, NONE of the search engines that complied with the Department Of Justice request gave out any of the personally identifiable portions. Your internet address, cookie and user accounts were all removed, as were dates and times. Instead of that long line above, this is all that was handed over, according to what the search engines have said:

britney spears nude

So yes, a private query you and others made was passed along to the government. But they could have easily seen the same by doing a search though tools listed here. Heck, they could have sat in Google's lobby while waiting to talk with Google lawyers about the request and wrote down the queries scrolling up the wall. But none of these methods would allow query data to be linked back to you personally.

Ah, but what if you entered a query that does seem to have personally identifiable data? For example, some people search for their social security numbers or telephone numbers, to see if that information is available online.

  • Yes, that's a private query, as all queries are likely considered private to searchers.  
  • Yes, that's private information, information probably not available to the general public and that you didn't intend to have revealed to anyone.  
  • No, that's not personally identifiable information. Even though you entered your phone number or your social security number, there's no way to know that you YOURSELF actually did it. In addition, there's no way to know whether the information is valid at all.

In other words, anyone might enter this information. Someone could even do this:

britney spears phone number 213-555-1212

and it doesn't mean that Britney did the search or that the phone number is correct.

How about a step beyond. Perhaps you know personally that someone is gay but that person hasn't come out to friends and family about it. You're wondering if anyone else might have said anything about this on the web, so you enter:

jenna bush lesbian

Now you've just outted one of President George W. Bush's daughters as a lesbian to a search engine. And since that search engine has handed over search data to the Department Of Justice, your private information just potentially became public in a big way.

Relax. Just because someone enters such a query doesn't mean it's true (and for the record, I have no idea if Jenna Bush is or isn't, not that it makes a difference to me. I'm just making up an example of what could happen).

It is possible that somehow, someway some of the search data could contain some private information that is personally identifiable. I can't rule that out entirely. I simply think it's a very unlikely case. Still, it's enough of a possibility that it formed one reason Google objected to the request from the Department Of Justice:

Moreover, Google's acceding to the Request would suggest that it is willing to reveal information about those who use its services. This is not a perception that Google can accept. And one can envision scenarios where queries alone could reveal identifying information about a specific Google user, which is another outcome that Google cannot accept.

Yes, it is possible, though it remains extremely unlikely. The better reason not to hand over data is covered in the first sentence, that handing over information would give the wrong impression overall to its users.

Of course, Google opens up another can of worms with its second sentence. If there could be identifying information in queries, then why does Google give advertisers access to keyword research tools such as this. Entering "714" gave me back a number of phone numbers that someone has searched for in that US area code. As I explained above, that doesn't tell me the phone numbers are correct. They certainly aren't personally identifiable. I can definitely get phone numbers in much easier ways. But it's private information that any Google advertiser can access by depositing $5 to open an AdWords account.

One of the best things about the Department Of Justice action is that it's raising a new examination of issues like these. Should search query logs be made less accessible to advertisers? Speaking from a marketer's perspective, I hope not, especially in that it really is an extremely unlikely case that any personally identifying information would be revealed. At the very least, it may cause people to think more carefully about what they put into search boxes in the first place.

One last thing. Whether queries themselves have personally identifiable information is especially becoming a hot topic in the comments at the MSN Search Weblog, where examples of a name followed by things like "nazi" or "aids" have been giving. Privacy infringements? No, in the sense that we don't know in such examples if any of the information is true or not, as I wrote above.

Having said this, I frankly don't know whether the government itself is smart enough to realize this. I already have written about how dumb I think they are in the request they've made. Then I read of this from Newsweek:

What if certain search terms indicated that people were contemplating terrorist actions or other criminal activities? Says the DOJ's Miller, "I'm assuming that if something raised alarms, we would hand it over to the proper [authorities]."

So the request for data that supposedly was just being done to measure whether children might encounter porn through search results now might be used for other things? Kind of scary -- though scary again from how dumb the Department Of Justice again appears to be.

What are they going to hand over? That a year ago, there was a search for something they think might be terrorist related, but that they don't know who did it, whether it was true or even worth the time to investigate at all?

Want to comment or discuss? Please visit our Search Engine Watch Forums thread, Bush Administration Demands Search Records.

Postscript from Gary: I realize that this is not an apples to apples comparison since screening for terrorists on airplanes is a very very very serious issue.

With that out of the way, after reading all of the postings and news coverage about the "search subpeona" this weekend it reminded me of seveal instances reported in 2004 where major airlines handed over traveller information to the government to assist them in building and testing a passenger screening database.

These stories got some press attention and some significant outcry from privacy groups but nothing close to what we're seeing today. Airlines including Jet Blue, Northwest, American, and Delta handed over records that in some cases contained personally identifiable data (credit card info, telephone numbers, etc.) I have links to several news stories here and a page from EPIC that summarizes the Northwest Airlines portion of the story along with related links. Again, let me stress that this is not a direct comparison to the current story but one that might be of interest to some of you.

Posted by Danny Sullivan at 7:48 AM | Permalink

January 22, 2006

Full Text Reports from the Congressional Research Service on Internet Privacy, Net Technology, and Protecting Children from "Unsuitable Material"

If you're interested in researching and learning more about U.S. Federal legislation (and related issues) dealing with Internet privacy, Internet technology, and the protection of children from "unsuitable material on the web," here are a few research reports from the non-partisan and highly respected Congressional Research Service at the Library of Congress.

I've done my best to offer links to the most current versions of these reports. However, please realize that many of these reports are updated frequently. So, it's best to check sources like Open CRS (and aggregator of web accessible CRS content), IPMall, and a collection from the University of North Texas Library, to make sure you're accessing the most current version of each report.

+ Internet: An Overview of Key Technology Policy Issues Affecting Its Use and Growth Updated: December 20, 2005 PDF; 51 pages The report includes a couple of pages on Internet privacy issues and protecting children.

+ Internet: Status Report on Legislative Attempts to Protect Children from Unsuitable Material on the Web Updated: December 16, 2005 PDF; 6 pages

+ Internet Privacy: Overview and Pending Legislation Updated: October 19, 2005 PDF; 6 pages

+ Constitutionality of Requiring Sexually Explicit Material on the Internet to be Under a Separate Domain Name PDF; 11 pages Updated: January 6, 2006

+ Personal Data Security Breaches: Context and Incident Summaries PDF; 32 pages Updated: December 16, 2005

Our coverage of the "subpeona" story includes the post: Bush Administration Demands Search Data; Google Says No; AOL, MSN & Yahoo Said Yes, where Danny offers a complete and detailed review of what's been happening. Another blog post: Court Documents & Summary Of United States Versus Google Over Search Data, includes links to the actual court filings, a detailed synopsis of what they contain, and a link to the current court docket.

More Reports?

In the first blog post listed above, Danny points to this post where we link to a Government Accountability Office (GAO) report published in June. From our blog post, "...it measured how often children might encounter porn through image search. To do the assessment, no subpoenas were required."

Posted by Gary Price at 1:53 AM | Permalink

January 21, 2006

A Brief Look at Danny's Appearance on Nightline

Danny's appearance on ABC's Nightline is over and within minutes of it ending I was able to access the transcript (thanks TVEyes for the help) of the report. I'm not going to post the entire transcript here but rather share a few of Danny's comments that made the air during the 5 minute report. Of course, you can read all of Danny's thoughts in this blog post.

The story opens with background on the subpoena (we have plenty of that elsewhere on the blog), including comments from the US Attorney General who says: We're not asking for the identity of Americans. We simply want to have some subject matter information with respect to these communications.

and

a comment from Sergey Brin who spoke to ABC News earlier today: The idea there could be such a large overreaching, in my mind, request, based on something so far off and not related to security or anything like that, I think that's worrisome.

Comments by Larry Page were also made to ABC earlier this evening here.

So, with what one side saying one thing and another side saying something else, where is one to turn? Danny Sullivan. Of course, all of us already knew this important fact.

Reporter: We turned to Danny Sullivan, of searchenginewatch.com, one of the world authorities on search engines. yes there are world authorities on search engines. it's a multibillion dollar business and quite baffling to most of us and even to some experts.

Danny: They [engines] can still be mysterious in some ways.

The reporter then introduces Danny. Btw, this is the first time I have ever seen Danny's home office and it's way cool, reminds me of NASA mission control.

Danny: They seem to be trying to understand how likely it is that if you were to use a search engine you might run into pornographic content.

Danny: They haven't asked for any information that's going to violate anybody's privacy in any way, shape or form. Reporter: But Sullivan says the government request shows something important. The government has no idea what it's doing.

Danny: It's overkill, the amount of data that they want. They're literally going to get more than a billion searches in what they're asking for.

Reporter: Sullivan thoroughly reviewed the government's subpoena, available online (and summarized by yours truly here). He says the government did not ask Google to remove automated searches from the data, the searches requested by software as opposed to the ones made by you and me. Note: You can access the documents and a summary of them here.

Danny: For the searches to remain any automated searches that happen, some people use automation to query the search engines on a regular basis. Since they haven't asked for those kind of automated queries to be remove, it suggests they don't even know it happened, which maybe suggests they aren't educated enough to know how the search engines operation or how behavior is on the searches in the first place.

On Yahoo, MSN, and AOL Danny: I think it would have been good if they had pushed back. Think the amount of data, even though it wasn't violating anybody's privacy, was so large and was going to raise so many red flags down the line that they should have done it.

On Search and Search Engines Danny: They go in so many direction, it's difficult for anybody to keep track of absolutely everything they're doing. Sometimes I think the search engines themselves aren't quite certain which way they go at times.

Here's a screen cap of Danny from the report.

Congrats Danny!

Posted by Gary Price at 3:15 AM | Permalink

January 20, 2006

MSN Search Blogs On DOJ Request

Now up at the MSN Search blog is a much better statement about the request they received from the US Department Of Justice and what MSN provided. It stresses that no personal information was handed over. I've said that over and over in my blog posts on the subject, that no personal information was involved, but it bears repeating.

The trust issue of handing over so much information still remains, of course. The Day After: Points In The Search Trust Sweepstakes covers that in more detail and why I think so many fear that even though it wasn't a privacy thing this time, it might be next time.

Probably the best thing about the MSN post is that you can comment on it. So if you want to tell MSN you aren't worried about what they handed over, want them to explain more or want to express your concern, the blog's a great opportunity to do that.

Posted by Danny Sullivan at 8:29 PM | Permalink

Google's Larry Page Comments on Privacy Matters

Along with Danny making an appearance of ABC News Nightline tonight, ABC's World News Tonight offered a look at the company a little while ago. A text version of the story is here. We learn about the corporate culture, etc. but we also here from Larry Page on privacy issues. The video is also now available.

From Larry Page: "Our company relies on having the trust of our users and using that information for that benefit," said Page. "That's a very strong motivation for us. We're committed to that. If you start to mandate how products are designed, I think that's a really bad path to follow. I think instead we should have laws that protect the privacy of data, for example, from government requests and other kinds of requests."

John Battelle is also quoted in the story. He says: "I think people are both fascinated and terrified, frankly," said John Battelle, author of "The Search."

Posted by Gary Price at 7:48 PM | Permalink

How Timely: Philipp Lenssen's Patriot Search

We can add search satirist/searh humorist to other terms like creative and inventive that best describe Google Blogoscoped's, Philipp Lenssen. Check out his latest and very timely creation, Patriot Search. Make sure to review the entire site especially the advanced syntax. :-)

Posted by Gary Price at 5:49 PM | Permalink

Privacy Groups, Goverment Officials Comment on Privacy and Web Search

A new Reuters article: Privacy experts condemn Google subpoena, offers a review of what several people who monitor privacy issue have to say about Google's decision not to share data with the government. Danny is quoted in the story.

Key quotes: "This is the camel's nose under the tent for using search engines and all kinds of data aggregators as surveillance tools," said Jim Harper of the libertarian Cato Institute who also runs Privacilla.org, an Internet privacy database.

A Google representative said the company objected to the breadth of the government's request but did not consider it to be a privacy issue since the search terms would not include personally identifiable details. But others were not reassured. Massachusetts Rep. Edward Markey, the ranking Democrat on the telecommunications subcommittee of the House Energy and Commerce Committee, said he would introduce a bill to strengthen consumers' Internet privacy by prohibiting the storage of personally identifiable information in Internet searches beyond a reasonable time. Ari Schwartz of the Center for Democracy and Technology said he was glad Google was fighting the case but the company needed to make privacy a more fundamental part of its products. On the other side, the Cincinnati-based National Coalition for Protection of Children and Families, a Christian fundamentalist group, said search companies should be willing to help the government defend children from pornography. "I'm disappointed Google did not want to exercise its good corporate branding to secure the protection of youth," said Jack Samad, the group's senior vice president.

Posted by Gary Price at 4:57 PM | Permalink

The Day After: Points In The Search Trust Sweepstakes

Since Google first started growing in stature, people have wondered if (or when) they might start passing along private information to governments or misusing it for their own gain. The company has faced hyperactive attention in this space, while others, as I have written, largely got a free ride from criticism. Moreover, the privacy freakout about Google was based on lots of "might dos" or "could dos" rather than "has done."

Yesterday was a historic moment in answering some of those doubts. What might Google do, if faced with an unreasonable demand from a government agency? Google will push back. And what might its competitors do, who have faced nowhere near the same amount of criticism? Comply.

Let's not get all starry eyed. Google pushed back in this case, but it may have complied with other governmental requests. Indeed, one of the best points in John Battelle's book "The Search" was the section focusing on the US Patriot Act and how Google (or other search engines) might not even be able to say if it has given out information.

Let's also not get foolish. I personally think that search engines should be following laws and especially helping authorities, even if that means handing over private information. But that has to be done when the proper procedures have been followed, when the right safeguards are in place and a real pressing need for the information is demonstrated.

None of that was the case with the information the Department Of Justice has requested. I don't have time to do a long deconstruction of how the information the DOJ wants will NOT allow the government to do what they claim. But I'll take a short look, especially at this claim that literally made my jaw drop:

Reviewing URLs available through search engines will help us understand what sites users can find using search engines, to estimate the prevalence of harmful-to-minors (HTM) materials among such sites, to characterize those sites, and to measure the effectiveness of content filters in screening HTM materials from those sites.

Reviewing user queries to search engines will help us understand the search behavior of current web users, to estimate how often web users encounter HTM materials through searches, and to measure the effectiveness of filters in screening those materials

Ah, HTM, the new WMD. HTM is "harmful to minor" material, porn that children might encounter accidentally. The Department Of Justice seems to think that if it has a list of queries, plus a sample of URLs indexed, it will magically demonstrate how often you might hit porn. Not at all. Not in the least.

Anecdotally, search engines will tell you that most of the content they index never makes it into the first page of search results. And content not in the first page of search results is effectively invisible and not seen, since most people don't drill deep. So sure, they could hand over a list of 1 million URLs. But you have no idea from that list how often any of those URLs actually rank for anything or receive clicks. It is non-data, useless.

Secondly, the list of search queries HAS NO RANKING DATA associated with it. So let's say the DOJ sees a query for "lindsay lohan." They don't know from that data what exactly showed up on Google or another search engines for that query, not from what they've asked for. Since they don't know what was listed, they further can't detect any HTMs that might show up.

In short, gathering this data is worse than a fishing expedition. It's a futile exercise that will proof absolutely nothing about the presence of HTMs in search results. All it proves so far is that the DOJ lawyers (and apparently their experts) haven't a clue about how search engines work. An actual search expert will tear apart whatever "proof" they think they can concoct from the data gathered so far.

As I said yesterday, a far better way to measure HTMs in SERPs (that stands for search engine results pages. I don't like the acronym, rarely use it, but let's play dueling acronyms today) to do this is to conduct some actual searches. That's what the US Government Accounting Office did when it wanted to measure porn in image search results.

Now let me bring this back to the bigger issue, that of trust. AOL, Microsoft and Yahoo DID NOT VIOLATE THE PRIVACY of any user by handing over this information. No private data was revealed. Nevertheless, by not pushing back against such a bad request for data, it leaves open the real fear that they might not push back if the US government decided to go on a real fishing expedition in the future. Privacy may not have been lost but trust was.

Picture this scenario. The US government wants to pass a new law on monitoring terrorists. In order to see the presence of searchers seeking out TOMs (that's Terrorist Oriented Materials) through search results, they ask each search engine to hand over an entire week's worth of search data completely with cookie info, IP addresses and registration information.

The purpose? They need to study how many people are seeking TOMs to have stats to support the law they want to pass. This is pretty much the same argument they are using in the current case, by the way.

Why shouldn't the search engines comply? Isn't it supporting terrorism by not helping?

No. There's a difference in reacting to help the law, according to laws, and being asked to participate in a police state. Consider this metaphor. Many people take part in Neighborhood Watch programs, to help better ensure the safety of their neighborhoods. It's a good idea, by and large. I don't recall there being that much controversy over such programs.

Now let's say the government decides that to better protect your neighborhood, they'll be placing cameras inside of everyone's home. What's the problem? I mean, unless you're doing something wrong, you don't have anything to fear, right?

The problem is -- speaking as a US citizen here -- that America was founded the principle of liberty. Stay out of my life, unless I'm doing something harmful to others. And if you think I'm doing something harmful, then use the carefully constructed and regulated laws to stop me from doing harm. Don't suspect everyone, monitor everyone and assume everyone is guilty. That's not how the country is supposed to work.

One more metaphor, which may better bring it home. Terrorists use telephones. Perhaps the government should go to the telephone companies and ask them to forward a week's worth of telephone calls, so they could determine how terrorists use the phone system. Anyone have a problem with that?

Actually, many people do. It's at the heart of the current controversy the Bush Administration faces, that it wiretapped -- listened in -- on phone calls without the legally required search warrants. The US Congress is about to hold hearings on the situation and whether it was illegal or not.

Search engines are like telephones, in some ways. We hold conversations with them, trusted conversations. We tell search engines things about our private lives we might not tell friends, doctors, partners and others. Is this a medical condition I should worry about? How do I handle an infidelity crisis I'm having? I need a new job -- can you help me? Private matters, not mean to be exposed to the world and certainly not meant to be swept up into a government net without an exceptionally good reason having been shown

Again, yesterday's news wasn't about privacy. It was more about trust, whether we can trust our major search engines for when privacy really is an issue. So how did they do on the trust front? I thought I'd hand out trust points to summarize, on a scale of one to 10, with 10 being best.

Google: 9 Points

Why isn't Google at 10 points? After all, they are the ones who stood up against the Department Of Justice. A couple of reasons for the one point loss:

  • I'd be happier if they'd shared with the world that they'd been subpoenaed back when it happened last year. That information could have been made public, and it should have been.  
  • Google also looked to be negotiating on compliance. We don't have enough details to know why they ultimately didn't give in, but some changes, and perhaps they might have.

So, not a perfect score -- but overall, high, high points.

Yahoo: 4 Points

Since they complied, I feel they have to be below the 5 point mark. OK, how trustworthy were they then in the after-the-fact statement? At first, they tried to make it seem they didn't comply by saying they gave no personal information out. Since no personal information was asked for, that was a non-answer. But almost immediately a revised statement fessed up.

AOL: 1 Points

"We did not -- and would not -- comply with such a subpoena." Except they did. Like Yahoo, they ran for the cover of saying they gave out no personal information. Instead, they just gave out some "search terms" -- which is what the subpoena asked for. That's compliance in part, and a loss of trust points for not being more forthright about what happened.

MSN: 1 Point

Like a loser in the Eurovision song contest, null points to MSN. They issued a non-statement, neither confirming or denying anything. Instead, we get the Department Of Justice saying they complied. They should have just fessed up from the beginning. That might have salvaged some trust out of the mess.

Postscript: In light of the most recent post on the MSN blog, I'll bring MSN Search up to match AOL. They took ages on the public relations side to say they complied, but the blog is direct and more forthright about it. What a difference it would have made if they'd been allowed to say that yesterday.

Ask Jeeves: Didn't Get To Play

I can only imagine the situation at Ask Jeeves. They had to be relieved they were never asked, so avoided the entire fracas. But not being asked means the US government doesn't think they count much as a search player -- so they probably also wish they had been involved!

What They Should Have Done

Just some tips for the search engine spinmeisters. Here's what you should have sent out:

AOLMSGooHoo did provide a list of URLs and search terms in response to the subpoena. We reviewed the request and determined that we could cooperate without any harm to the privacy of our users. We would have preferred not to have been given a legal summons and have serious doubts if the information will help the US government determine what it seeks. However, we felt our time was not best spent fighting on this front. Rest assured that if personal information had been at stake, we would have vigorously fought to defend the privacy of our users, to the degree the law allows.

The better answer, of course, would have been to fought the request in the first place.

In the end, one of the biggest ironies is that the other search engines failed to capitalize on Google's biggest weakness, that it might not be trustworthy. I wrote back in 2002 about how Google was going to face a challenge of being seen as too Microsoftish, too dominant a player.

Ironically, that put Microsoft itself in the enviable situation of counter-balance to Google. Rather than being the evil player, it was the player along with Yahoo that many hoped would restore some balance to the search space.

Had Microsoft said no, it would have scored major points for trust. But to say yes -- then not even admit to saying yes -- just makes Google seem better and better to many people.

Want to comment or discuss? Please visit our Search Engine Watch Forums thread, Bush Administration Demands Search Records.

Posted by Danny Sullivan at 10:56 AM | Permalink

January 19, 2006

Court Documents & Summary Of United States Versus Google Over Search Data

Earlier we reported in Bush Administration Demands Search Data; Google Says No, Yahoo & MSN Said Yes that the US Government seeks to force Google to hand over search data. That story explains more about the situation, and there have been a number of postscripts from when it was first written. Along with that, we've been able to obtain copies of the three court documents filed in the case. Below you'll find links to each document, along with a summary of what's in each of them.

Alberto Gonzalez, as Attorney General of the United States vs. Google Notice of Motion to Compel Compliance (PDF File)

Two quick points. Remember, that this brief was filed by the Government and does not offer a response to their claims. I'm sure that will be coming. Second, I'm not an attorney and haven't played one on tv. My purpose was to summarize what was presented in the document.
  • The motions requests that Google comply with a subpoena filed by the Attorney General and "produce" for inspection and copying the materials the Government is asking for.  
  • After the lead government attorney conferred with Google, Google has chosen not to comply with subpoena.  
  • Google is asking the court to make Google comply  
  • The filing then goes into a background explanation about the Children's Online Protection Act (COPA) and how the government is developing its defense of the constitutionality of COPA. They believe that COPA is, "more effective than filtering software in protecting from harmful exposure to harmful material on the Internet."  
  • In preparation of the case, subpoenas were issued to Google and "other entities" that operate search engines to produce two sets materials.  
  • First, the subpoena asks Google to produce an electronic file contain, "[a]ll URL's that are available to be located on your companys' search engine as of July 31, 2005.  
  • However, after "lengthy negotiation" the government changed and "narrowed" their request and asked for a "multi- stage random sample of one million URLS from Google's database ie, a random selection of the various databases in which those URL's are stored, and a random sample of the URL's held in those selected databases.  
  • Second, Google was asked to "produce an electronic file containing [a]ll queries entered into the Google engine between July 1 and July 31 inclusive.  
  • Again, after lengthy negotiations the government the government changed their request and asked for an electronic file "containing the text of any search string entered into Google's search engine for a one week period (absent any personal information identifying the person who entered the query).  
  • Google has still refused to comply with these requests in any way.  
  • The Government says that access to this information would be of "significant significance" in the preoperation of the their case.
  • Specifically why?  
  • "The production set of queries entered into Google's search engine would assist the Government in its efforts to understand the behavior of current web users, to estimate how often web users encounter harmful-to-minors material in the course of their searches, and to measure the effectiveness of filtering in screening that material."  
  • This information would also help the Government understand what, "web sites people find through the use of search engines, to determine the character of those sites, to estimate the prevalence of harmful-to-minors material on those sites, and to measure the effectiveness of filtering software on that harmful to minors material.  
  • The document continues into a discussion with plenty of legalese and citations and again points out the Google has failed to comply and lists some of the reason Google objects to this.  
  • Google first objects to this on the grounds of relevancy.  
  • Google also objects on the grounds that if they would provide what the government asks for, they would be required to produce information identifying the users of its search engines.  
  • The Government claims that this is "illusory" since they have specifically asked for a random sample containing no personally identifying information to any search string.  
  • The Government said that it has received compliance from search entities with files containing no personally identifying information.  
  • Google also contends that the information they're being asked to produce is "redundant" since the Government has asked other engines to produce similar files. The Government argues that this "misunderstands" what's being requested. "The production set of queries from Google's database, in combination with similar productions from other search engine operators will assist the Government in developing a sample of the overall universe of search engines queries, while accounting for the potential of any variations in the type of queries that are entered into different search engines."  
  • The Government says that since Google is the market leader, its response, "would be of value" in developing the Governments overall sample of queries.  
  • Google says that complying would also force Google to share trade secrets because the total number of queries receives in a day is a trade secret. The Government adds that if this was the case, a district court has said that these numbers would not be disclosed.  
  • Finally, according to the filing, Google says that it will be subject to an "undue burden" in complying. The Government claims that this is not the case whatsoever. The Government adds that they would be "willing to work" with Google to specify a multistage sample. They are also willing to compensate Google for its work and complying with the subpoena.  
  • The filing ends with the Government saying that, "This court should require Google to comply with the subpoena on the same terms it's competitors have."

Declaration Of Joel McElvain (PDF File)

    The second filing is a declaration by Government attorney, Joel McElvain, who I believe the lead attorney for the U.S. Department of Justice in this matter. It also helps produce a timeline of events to this point. It includes:
  • A copy of the original subpoena, originally signed on August 25, 2005
  • Detailed info and definitions about Google was to submit to the Government.
  • A several page letter, dated October 25, 2005, from Ashok Ramani, Commercial Litigation Counsel, Google sent to Joel McElvain with his objection to the subpoena. THIS IS A MUST READ!!!
  • Key Quotes and Passages from the Letter

  • "It is against Google's competitive interest to be viewed as reflecting the whole world wide web."
  • Worth noting that Google says that the government tried to use Archive.org/Wayback Machine and found the results unsatisfactory. From the letter, "...given the www.archive.org's stated purpose, one would expect them -- with an appropriate consulting relationship to create the results the DEFENDANT wanted.
  • The Governments request is seen as redundant because they already has URLs from at least one other engine
  • From the letter, "Though the search engines doubtlessly have some differences in the URLS, they store, what distinguishes Google from it's competitors is the sophistication of Google's search engine in locating and ordering relevant results."
  • On the burden to Google. "Google would have to spend a disproportionate amount of engineering time and resources to (i) number (even in rough terms) in real time the URLs contained in its search database and (ii) extract based on that initial numbering the URLs selected by Professor Stark.
  • Google also objects because it could "endanger" its "crown-jewel trade secrets." Specficially, they would have to disclose the approximate number of URLs in its database and "some" details on how it crawls URLs, "such as the number of servers, server distribution, and how often Google crawls the World Wide Web."
  • More objections. "Google objects to the Defendant's view of Google's highly proprietary queries database as a free resource that Defendant can use, some levels removed, to formulate its own defense."
  • "Moreover, Google's acceeding to the Request would suggest that it is willing to reveal information about those who use its services. This is not a perception Google is willing to accept. And one can envision scenarios where queries alone could reveal identifying information about a specific Google user, which is another outcome we cannot accept.
  • Next, we find another letter. This time it's from DOJ's McElvain to Google's Ramani. This later is dated December 23, 2005.
  • The letter discusses how the Government is willing to narrow what's asked for in the subpeona
  • This is summarized in the Alberto Gonzalez, as Attorney General of the United States vs. Google section of this post.
  • McElvain discusses how Google asked for and was granted two extensions to serve their objections to the subpeona until October 10, 2005. He then writes, "In our several discussions prior to the service of those objections we had offered to limit the scope of of the requests for production, and you had indicated Google's willingness to consider compliance with the subpeona along with the narrowed terms that we had suggested. Your written objection also reiterated your hope to reach a resolution regarding Google's compliance with the subpeona. However, shortly after the service of your objections, you telephoned me to inform me that Google would decline to comply with the subpeona.
  • More conversations between the Government and Google take place on December 12th and December 21st to discuss the technical aspects of the request. Finally, on December 21st, MacElvain was informed that Google would not comply with the subpeona.
  • The final document is a protective order in the ACLU v. U.S. case.

Declaration Of Philip B Stark (PDF File)

This document is a declaration by Philipp Stark, Ph.D who was the person to work on the project. Dr. Stark is a Professor of Statistics at the University of California, Berkeley.
  • Stark explains how he has had conversations with the USDOJ, Google and other search providers, "to develop practical approaches to sampling their databases or URLs and search queries."
  • He adds that he has started to analyze the samples produced by search providers other than Google.
  • He writes, "Reviewing user queries to search engines will help us understand the search behavior of current web users, to estimate how often web users encounter HTM materials through searches, and to measure the effectiveness of filters in screening those materials.
Stark goes on to add more about his approach while including Google results are directly relevant.

Posted by Gary Price at 4:18 PM | Permalink

Bush Administration Demands Search Data; Google Says No; AOL, MSN & Yahoo Said Yes

NOTE: We're continuing to update this news through postscripts below the original story.

Via John Battelle and Google Morning Silicon Valley, the San Jose Mercury News article "Feds want Google search records" covers the Bush administration demanding last year that Google and other search engines turn over aggregate search information to help revive a child protection law. Google has refused to comply with the subpoena. A motion has been filed this week by US Department Of Justice to force Google to hand over the data.

In particular, the Bush administration wanted one million random web addresses and records of all Google searches for a one week period. The government apparently wants to estimate how much pornography shows up in the searches that children do.

Here's a thought. If you want to measure how much porn is showing up in searches, try searching for it yourself rather than issuing privacy alarm sounding subpoenas. It would certainly be more accurate.

Getting a list of all searches in one week definitely would let US federal government dig deep into the long tail of porn searches. But then again, the sheer amount of data would be overwhelming. Do you know every variation of a term someone might use, that you're going to dig out of the hundreds of millions of searches you'd get? Oh, and be sure you filter out all the automated queries coming in from rank checking tools, while you're add it. They won't skew the data at all, nope.

Moreover, since the data is divorced from user info, you have no idea what searches are being done by children or not. In the end, you've asked for a lot of data that's not really going to help you estimate anything at all.

Far better would be to do some searches that you think children and teens are actually doing, such as by doing a survey of them. Then just go start searching on Google and the other search engines yourselves. See what actually comes up, especially when the filtering protection each service offers is enabled. That would give you plenty of data, plus it would be useful for everyone to have someone rigorously test the filtering systems that are offered. Serving subpoenas to get the data isn't necessary.

It's important to note that from what I read, the requests do not involve user data at all. Shutting off your cookies or purging your personalized search data wouldn't protect you with this request, because the request wasn't going after personal data. To stress again:

  • According to the report, they wanted a list of one million web addresses. Not who went to the web pages and when, just a list of URLs picked randomly.  
  • They wanted searches for one week. I haven't seen the court documents, but I'm guessing Google could have handed over a list of searches that were entirely unassociated with IP addresses, times, cookies and registration information. Nothing suggests that they wanted to know who did the searches in any way.

Having said this, such a move absolutely should breed some paranoia. They didn't ask for data this time, but next time, they might. Of course, it bears reminding that this type of data is easily obtainable from ISPs. So even if the search engines refuse to comply, your own ISP could be giving up your data -- or selling it.

Overall, I say kudos to Google for declaring the request overreaching and refusing to comply. I'm checking with the other major search engines to see if they handed over data.

I've spoken and written a bit about the idea that the search engines need to consider creating a clear "Search Privacy Bill Of Rights," spelling out clearly what protections they'll pledge you'll always have with your data and exactly how it will be used, destroyed and so on. I want to move ahead with more explorations of this -- and perhaps we need a similar one enacted by governments to spell out what they will and will not do with our highly private search data.

Moving Past Google Privacy Fears & Toward An Industry Solution from me last year gives you a lot of background on search privacy issues from over the years. There's an extensive reading list at the bottom.

After I put that out, I also created a thread at our Search Engine Watch Forums, How Should Search Engines Protect Privacy?. Unfortunately, that thread -- while it got lots of discussion -- never generated as many concrete ideas and suggestions about what should go in a Search Privacy Bill Of Rights as I hoped for. So I'm trying again. Got thoughts, comments, suggestions? Please visit our new thread, A Search Privacy Bill Of Rights.

Meanwhile, want to talk about this particular move by the Bush Administration? I have a different thread for that, Bush Administration Demands Search Records.

Postscript 1: I have queries out to AOL, Ask Jeeves, MSN and Yahoo to find out if they provided data. I'll note answers here or in a new post.

Postscript 2: I said above that a more accurate way for the government to assess how often children might encounter porn through search engines would be to conduct their own research. Indeed, they have. Government Report Says MSN Search Adult Filter Most Effective from the SEW Blog back in June covers this report (PDF format) that the US Government Accountability Office did back in June. From what I can see, it measured how often children might encounter porn through image search. To do the assessment, no subpoenas were required. From what I posted in our active Bush Administration Demands Search Records discussion at the Search Engine Watch Forums on today's news:

FYI, back to the idea of child filters on search engines, the US government has tested this, as Government Report Says MSN Search Adult Filter Most Effective covers. Note that to do this, they said:

We performed unfiltered 5-minute searches for six keywords: three keywords known to be associated with pornography and three innocuous terms that juveniles would likely use (a popular teenage singer/actress, a popular cartoon, and a popular movie character).

They managed to do this assessment (the US Government Accounting Office) without issuing a subpoena to anyone. Moreover, it has stats they say they want already produced and ready to go. Page 48 and 67 have details. The caveat is that this seems to have been a test of image search results (Yahoo was 92 percent non porn, MSN 76 percent, Google 64%). But you could do the same thing to measure web search.

Postscript 3: Here's the official Google statement from Nicole Wong, associate general counsel with Google. It's what they already told the San Jose Mercury News and are telling other publications:

Google is not a party to this lawsuit and their demand for information overreaches. We had lengthy discussions with them to try to resolve this, but were not able to and we intend to resist their motion vigorously.

Postscript 4: MSN statement is below. It doesn't really answer the question, which was if they complied with a subpoena to hand over data similar to what Google's being sued over. Since it's not a denial, I'm reading this as a tentative yes, that they got a request and passed the data along. I've asked for clarification. The statement:

MSN works closely with law enforcement officials worldwide to assist them when requested. Microsoft fully complies with the Electronic Communications Privacy Act and United States Law as well as Microsoft's terms of use and privacy policies in working with law enforcement. It is our policy to respond to legal requests in a very responsive and timely manner in full compliance with applicable law. MSN takes the safety of its customers very seriously and is committed to providing a safe experience for consumers. As stated in MSN?s Terms of Use and Subscription Agreements, Microsoft will comply with applicable law to edit, refuse to post, or to remove any information or materials, in whole or in part, in Microsoft's sole discretion.

Postscript 5: It's important to note this case is not about stopping child porn. It's about trying to get a law passed that would help the government shut down sites that allow children themselves to access porn. To prove a need for the law, the US government wants to show how much porn children might encounter through searches. It's easy to confuse these two completely different things. I did originally, corrected the first draft of my story, but I still had a section stressing the child porn angle. I've remove that from the story above. Here's what I pulled out, for those who care about such edits:

Getting a list of all searches in one week definitely would let US federal government dig deep into the long tail of porn searches. But then again, the sheer amount of data would be overwhelming. Do you know every variation of a term someone might use, that you're going to dig out of the hundreds of millions of searches you'd get? Oh, and be sure you filter out all the automated queries coming in from rank checking tools, while you're add it. They won't skew the data at all, nope.

If you do, from talking with the head of a child porn fighting group in the UK, my understanding is that many euphemisms and code words are used that won't immediately register as child porn terms.

I can assume the Bush administration probably has investigators smart enough to know the euphemisms and other terms that those after child porn might seek. If you've got that list, just go start searching on Google and the other search engines yourselves. See what actually comes up, especially when the filtering protection each service offers is enabled. That would give you plenty of data, plus it would be useful for everyone to have someone rigorously test the filtering systems that are offered.

There are plenty of other ways to get samplings of non-porn searches that are done, to measure whether porn is showing up in response to these. Serving subpoenas to get the data isn't necessary.

Postscript 6: Ask Jeeves did not provide data, as they were not asked. Statement:

Ask Jeeves has not received requests for search data from the Department of Justice in this matter.

Postscript 7: Yahoo got a request, and I'm guessing compled. Guessing? The statement is below. At first, you'd think they didn't give any information. But that's not what it says. It says they gave no "personal information." That's easy enough, since as I noted above, the government didn't request any personal information. The aggregate data they wanted wasn't personal. Therefore, Yahoo may have handed that over. I'm following up. Statement from spokesperson Mary Osako:

We are rigorous defenders of our users' privacy. We did not provide any personal information in response to the Department of Justice's subpoena. In our opinion, this is not a privacy issue.

Postscript 8: New statement came in about a minute after I posted above, making it clear Yahoo did comply:

We are rigorous defenders of our users' privacy. We did not provide any personal information in response to the Department of Justice's subpoena. In our opinion, this is not a privacy issue. We complied on a limited basis and did not provide any personally identifiable information.

Want to comment or discuss? Visit our SEW Forums thread, Bush Administration Demands Search Records.

Postscript 9: In fairness to Yahoo, which handed over information -- and MSN which likely did the same -- it is important to note that it is not just spin that no privacy issues were involved with this particular data. As I explained in the story, the information is completely divorced from any personally identifiable data.

Let me especially stress this. Want 1 million random web sites? There's no privacy issue in that. The government didn't ask for the "bad" sites or sites that were linked with any particular activity. They just wanted a list of sites, probably so they could do a survey.

It's a stupid request, of course. It's sort of like the government asking a major car dealership to give you a list of random license plate numbers rather than the Department Of Motor Vehicles. Surely the government can generate its own list without forcing a private company to do this.

How about those search requests? They are a list of searches with no user data associated with them. If that's a user privacy issue, then live displays such as listed here are a long-standing one.

Here's a better example. Infospace -- which owns the Dogpile meta search engine -- has sold raw search data to Wordtracker for years. I have never heard of anyone concerned about the privacy implications in that. This is because there aren't any. You can't see who did a search, IP addresses, cookies, etc. It's just a big long list of words.

To hammer home the point, look at this:

That's the live (and warning, unfiltered) search display from Dogpile as I wrote this postscript. See anything linking any individuals to those searches? No, and that's all the US government would have gotten, a raw list of millions of searches.

So why the hoopla? Why not give in? Two reasons:

  • Competitive: Why give even raw search data out that possibly might fall into the hands of competitors. Even then, the lists from each major search engine will be pretty similar, so not that much of a worry.  
  • Trust: The data, as I've written, isn't going to help the government at all in what they say it will do. Heck, if they really need that list, they could buy the data from Wordtracker. But by handing it over, the search engine loses the perception of trust with its users. They may not understand that it is not personal. They will understand the government made a wideranging request for information and that the search company didn't push back. That type of trust is worth defending in the face of an ill advised, useless government action.

Postscript 10: MSN says they aren't providing more specifics beyond the statement they gave above. Since that statement does NOT deny that they provided information, I can only assume that they did. Unfair assumption? Well:

  • If they didn't get a request, as with Ask Jeeves, they'd say so (and probably breathe a sigh of relief that they didn't get one).  
  • If they did get a request and refused to comply, I'd expect we'd have seen a court case by now, as we are with Google.

That only leaves that they got a request, and that they replied. If I'm wrong, I'll happily post a correction and new statement, if MSN provides one.

Postscript 11: Seth Finkelstein sent me a link to his Free porn, Google, spam, Internet censorship, and the Supreme Court post, which highlights something Gary and I have written about for ages. You can't trust search engine counts to prove anything. While counts themselves haven't been shown to be an issue in this case, Seth's post shows that they might be something the Department Of Justice is considering. From the Boston Globe article he points at:

Ordinarily, US Solicitor General Theodore B. Olson prepares for an appearance before the Supreme Court by acting out his argument before a pretend court. This time, for a case about the Internet, he added a new twist: searching online for free porn.

At his home last weekend, Olson told the justices yesterday, he typed in those two words in a search engine, and found that "there were 6,230,000 sites available."

The top lawyer who represents the Bush administration before the Supreme Court said the search's results illustrate how pornography on websites "is increasing enormously every day," a central point in his argument for saving an antipornography law that was enacted six years ago but has yet to go into effect.

Hmm. Six million porn sites available? OK, let me do it now on Google. Now I get a figure of 26,900,000. How porn has grown. Ah, but how many pages (the count is for pages, not web sites) do we have in all? Google doesn't report a figure. But if I search for -kfdjkkdjdkfjdkjdk9d09d09d0jdkfdkjkf, a word that doesn't exist, I get a count of 9.7 billion pages. I know that the count is much higher than this (read this to understand more), but let swing with that figure:

26.5 million / 9.7 billion = 0.27% of the web equals free porn

You want to take that figure to court to show there's a lot of porn? Please. But that figure still doesn't mean anything. A search for online porn at Google only shows you pages that have those two words on them. They could be pages writing about the evils of online porn, how to avoid online porn, why online porn should be banned. Consider this:

That's a heck of a lot of pages with "no free  porn" on them!

Fox News & Danger Of Citing Search Counts over at our Search Engine Watch Forums is another example of the fallacy of citing search counts to prove points. For more deconstructing of the Olson proof, be sure to read Seth's send-up.

Postscript 12: Court documents we've obtained so far are now up. Gary's also working very hard to summarize what's in them. See them over at his Court Documents & Summary Of United States Versus Google Over Search Data post.

Postscript 13: AOL appears to have been asked and complied, at least according to the ACLU. I'm still waiting to hear back from AOL. Via Google Blogoscoped, Feds take porn fight to Google from News.com summarizes the court documents. The ACLU challenged the law the US government seeks to revive, the Child Online Protection Act. An ACLU attorney told News.com that Microsoft, Yahoo and AOL all chose to comply.

AOL disputes what the ACLU says -- but from what I read, that dispute is the same as Yahoo's original statement that they didn't give any personal information (Postscript 7 versus Postscript 8, above). Since the government didn't ask for any personal data, of course AOL didn't hand any over. But AOL says is did hand over search queries from a roughly one day period.

Postscript 14: Xeni Jardin over at Boing Boing has confirmation that AOL, MSN and Yahoo all received requests from the Department Of Justice along with Google. Google did not comply, hence the legal action.

Postscript 15: AOL sends a statement now saying they didn't comply, though it still looks like they did in part, as I explained in Postscript 13. To say they handed over no personal data is a non-issue. The Department Of Justice demanded no personal data. It did demand a list of search terms, and AOL appears to have given some amount of these to the DOJ. The statement:

We did not -- and would not -- comply with such a subpoena. We gave the DOJ a generic list of aggregate and anonymous search terms. This did not include search results, nor any personally-identifiable information, and therefore there were absolutely no privacy implications.

Postscript 16: MSN sends a statement today (Friday, Jan. 20) saying they complied with the subpoena:

Microsoft typically does not comment on specific government inquiries. That said, as you may have heard from the DOJ they did contact us in this case. We take the privacy of our customers very seriously. We did comply with the their request for data in this case in a way that ensured we also protected the privacy of our customers. We were able to share aggregated query data (not search results) that did not include any personally identifiable information.

Postscript 17: Xeni Jardin over at Boing Boing has AOL saying they did not comply with the subpoena. It's hair splitting time on which way to go on this. As I explained in Postscript 15, the argument that AOL gave no personal data is a non-issue. No personal data was requested. They did give a list of aggregate and anonymous search terms. That's exactly what the subpoena requested. The amount they gave is uncertain. Google was asked to give search queries for all of July 2005, which was later negotiated down to a request for a week's worth of data. AOL probably gave less than originally requested but still likely a big chunk of information. No mention of whether any URLs were handed over. I still see this as complying, but I'll follow up more with AOL about it.

Postscript 18: See also The Day After: Points In The Search Trust Sweepstakes from me. It reflects back on some of the bigger issue points raised from the situation.

Want to comment or discuss? Visit our SEW Forums thread, Bush Administration Demands Search Records.

Posted by Danny Sullivan at 6:03 AM | Permalink

December 15, 2005

Google Firefox Extensions For Anti-Phishing & Popping-Up Google Blog Search Results

New Firefox extensions over at Google highlights that there's a new anti-phishing tool now out for Firefox users plus a new add-on that lets you see what people are saying about pages you visit through Google Blog Search.

Google Safe Browsing for Firefox is the anti-phishing tool, similar to the Phishing Filter Add-In for Internet Explorer with the MSN Search Toolbar. More about that in our past post, TrustWatch & MSN Offer Anti-Phishing Tools To Searchers & Surfers.

Blogger Web Comments is the Firefox extension to show you what people are blogging about relating to pages you are viewing.

What happens is that when you go to a particular page, a little window pops-up in the lower right-hand corner of your screen. It will show you a comment from someone on a blog that's linking to the page you are viewing. You can also click to see more "comments," which is a handy way to check what people are blogging about the page you are reading.

These are NOT just people using Blogger that are commenting via blogs. Instead, what Google's doing is simply generating a backlink lookup on Google Blog Search and showing you the summary of the first thing listed on that page.

For example, here are the backlinks to the Search Engine Watch home page. The first thing on that page at the moment is:

Google Adds Music Search Feature 33 minutes ago by Bruce Houghton GoogleGoogle this morning added a music search feature to it's popular search engine. Type in for example "Pink Floyd" and the top result featured includes a photo of the artist, a bit of information, and a link to "More music results ... hypebot - http://hypebot.typepad.com/hypebot/

In my pop-up box, that got turned into:

hypebot GoogleGoogle this morning added a music search feature to it's popular search engine. Type in for...

You can also use the "Add comment" link in the pop-up box to add your own comments about the page. What this really does is send you to your own Blogger-based blog (or suggests you open one). But to "comment," you simply need to have a page that shows up in Google Blog Search that links to the page someone is viewing. Remember, however, that over time your "comment" will drop down the list.

More about Google Blog Search is covered in our past articles, Google Launches Industrial Strength Blog Search and Thoughts On & Poking At Google Blog Search.

Posted by Danny Sullivan at 10:28 AM | Permalink

December 14, 2005

Windows Live Location Finder & Privacy

Microsoft has responded to user concerns about the type of information that's transmitted and used when you use the new Windows Live Local "locate me" feature. From a blog post by Chandu Thota, SDE Lead of the Microsoft Virtual Earth team:

No personal information such as your name or contact information is sent to Microsoft by Location Finder service. Also, Location Finder service was designed with concern for your personal information; secure methods such as SSL are used when transferring location information between your machine and the Microsoft location service.

The post also notes that no information is shared, and offers a bit more technical detail about how the feature works. Reassuring information if you're concerned when you use this cool new feature from Microsoft.

Posted by Chris Sherman at 7:02 PM | Permalink

December 1, 2005

Complaints Over Wikipedia Accountability With Bios

Daniel Brandt's been upset over the accuracy and presence of a page about him at Wikipedia, and now John Seigenthaler, the former assistant to US Attorney General Robert Kennedy, is upset as well over his Wikipedia biography, venting his frustration in a USA Today article.

A false Wikipedia 'biography' has Seigenthaler sounding out his complaint, the 78 year old declaring that only one sentence in his bio was true. He managed to get Wikipedia to remove the material he objected to removed, though with Wikipedia's community editing system, I don't see anything that prevents that from coming back.

It's also somewhat confusing that if only one sentence was accurate -- and the objectionable material was removed -- why is there still a fairly lengthy bio on him at Wikipedia?

Overall, the concerns are still well taken. There's no guarantee of accuracy at Wikipedia, though that's true of any publication. The difference is that Seigenthaler illustrates how difficult it was for him to know who should be accountable. That's not the case with more traditional reference resources.

Moving on to Brandt, he also raised the difficulty in a post at Google Blogoscoped of knowing who was responsible over creation and changes to his own bio. Lots of comments followed his article. This Google Blogoscoped post from the end of October outlines Brandt's original objection.

Brandt -- most known for creating the Google Watch protest site -- outlines his concerns about Wikipedia more directly at his new Wikipedia Watch site.

But the above situations illustrate the real concerns people might have over the accuracy of what's said about them and the inability to get accountability when needed. Brandt, unlike Seigenthaler, questions whether there is a privacy violation in having a bio at all or with some of the material in it:

The privacy issues interest me even more than the libel issue. Unfortunately, the laws on privacy are less clear, and discussions on privacy will not be as focused. In Florida, where Wikipedia is located, there is an invasion of privacy statute that might apply in this case, even assuming that everything in the article is true. At issue would be the public disclosure of truthful private information that a reasonable person would find objectionable. Would a reasonable person find Wikipedia's mention of facts about my 1960s activism objectionable? Not at the moment, hopefully, and yet it wouldn't take much for this situation to change. Another act of terrorism on U.S. soil, followed by a stronger version of the U.S. Patriot Act, and "reasonable" people might feel that I should, once again, be watched by the FBI, CIA, and local police the way I was in the 1960s. Does Wikipedia consider issues such as this? Of course not ? information wants to be free, and nothing must stand in its way.

Brandt in particular is probably on weaker ground here. He's been widely cited on Google issues in many popular press articles. He is a public figure.

Brandt's also had no problem declaring that others have no rights to privacy based on whatever criteria he determines, as I covered in my article about his nomination of Google for a Big Brother award:

I found it ironic that Brandt's site, which champions privacy, named the actual engineer who formerly worked for the NSA. Did Brandt see any privacy issues in doing that? No.

"Do you know of others at Google with security clearances? If so, send me their names and I'll be sure to mention them as well," Brandt said, noting that the engineer's resume had been on the web for years. "Agents of powerful, secret organizations have no right to privacy, in my opinion. I've been in favor of naming CIA officers for 30 years now. The NSA is no different," he said.

So a privacy violation? Not in my book. But I have a huge, huge degree of sympathy over the lack of accountability and control concerns that he and Seigenthaler complain about. That's likely to be a problem that will grow for Wikipedia, unless they come up with some controls.

Gary's still planning either a podcast or a written interview with Wikipedia founder Jimmy Wales, as he's written before, so expect some comment from him on the situation to come in the blog.

Posted by Danny Sullivan at 9:21 AM | Permalink

November 29, 2005

CustomizeGoogle Offers New Option to Block Google Analytics Cookies

CustomizeGoogle is a popular Firefox extension that offers numerous (understatement) to change the look of Google pages. One options is being able to easily remove ads from Google results pages.

Today, the CustomizeGoogle Blog points out yet another new optional feature that allows users to block Google Analytics cookies on ANY web site.

This page has a detailed explanation, examples, issues, and other methods of how not to be tracked.

Over the weekend I posted about a another new option discussed on the also new CustomizeGoogle blog that allows pages from Google Book Search to be printed.

Posted by Gary Price at 1:57 PM | Permalink

November 28, 2005

Lycos Europe Ordered By Dutch Court To Reveal Member's Identity

Dutch Court Orders Lycos To Reveal Client's Identity from Dow Jones has the supreme court in The Netherlands ordering Lycos Europe to reveal the identity of a Lycos user who is said to have posted slanderous allegations against an internet stamp dealer, apparently on pages that Lycos Europe provides to its members.

You may recall that many were up in arms over Yahoo handing over information about one of its members to Chinese authorities that led to the dissident being imprisoned. Some wanted Yahoo not do business in China or relocate servers outside of Chinese jurisdiction.

Now it will be interesting to see if Lycos Europe comes under similar pressure not operate in The Netherlands or Europe to help protect the privacy of its members -- a real challenge considering it is Lycos Europe that we're talking about.

I suspect we won't see a huge outcry, given that we're not talking about a dissident here, plus China generally being a hotter button for people than The Netherlands in terms of privacy issues. But P2Pnet is at least one site that's upset with the ruling, worried that this move will help entertainment companies that might go after individuals over music downloads.

Posted by Danny Sullivan at 1:38 PM | Permalink

November 11, 2005

Murder, Love Gone Awry, Google Searches & Bogus Clicks

Google Blogoscoped and Andy Beal both point to a story about an accused murderer searching for "neck," "snap," "break" and "hold" on Google, evidence prosecutors say that the man murdered his wife. Meanwhile, Barry Schwartz at Search Engine Roundtable points to a WebmasterWorld thread where someone reports a jaded girlfriend clicking on his AdSense links and getting his account suspended.

Posted by Danny Sullivan at 9:44 AM | Permalink

November 8, 2005

Why We Use Various Search Engines

InternetRetailer has done some nice charts off of a Majestic Research/comScore report looking at why we use particular search engines (for Google, it's the results; for others, it's because you're doing other things). The stats also look at awareness of paid links and tolerance of demographic and behavior targeting. Here's a summary:

For the question of why people use particular search engines, top reasons for each major service were:

  • Google: 68 percent say it's because it has the best results.  
  • AOL: 65 percent say it's because they are doing other things at AOL, like checking mail and other non-search activities.  
  • MSN: 63 percent say it's just like AOL, because they are doing other things there.  
  • Yahoo: 52 percent say it's just like AOL, because they are doing other things there. Yahoo got the second highest marks for having the best results after Google, with 33 percent choosing that reason.

The report found that AOL and Google users were the most likely to notice sponsored links (82 and 81 percent, respectively) while MSN users were the least likely to notice them (69 percent).

As for privacy, 58 percent said they weren't worried about being demographically or behaviorally targeted as long as it was disclosed and they could opt out. And 27 percent said they'd keep using a search engine even if they couldn't opt out.

Haven't tracked down the actual report yet; will postscript, if I can find it.

Posted by Danny Sullivan at 1:11 PM | Permalink

New Ad Targeting Options May Fuel Privacy Worries

Both MediaPost and AdAge have stories on a session at Ad:Tech yesterday on a panel about how search engines are starting to target ads more on personal behavior and touching a bit on the privacy backlash that might bring.

John Battelle was on the panel and talked about how privacy policies on how data is used seem unclear. Gary Stein who heard the panel thought John's idea that people for a transparent system where people can view information stored on them and edit it might be too complex.

Then again, maybe not. If you could view the profile used to target you at Yahoo, Google, etc and delete or expunge it, that might be reassuring. However, that might not wipe out the underlying data (IP address, search queries and other material from logs) and not be reassuring enough to privacy advocates.

I've touched on in various places that I'd like to see some type of "Search Privacy Bill Of Rights," a topic I hope to get back to in the near future. At the very least, I'd like to see common concerns spelled out very clearly along with a common language that all search engines would use to explain where you stand on them.

How Should Search Engines Protect Privacy? in our Search Engine Watch Forums covers this more, and I'd love to hear your comments there, as well.

Posted by Danny Sullivan at 10:16 AM | Permalink

October 15, 2005

Google Updates Privacy Policy

If you're looking for some weekend reading, Google has just updated their privacy policy.

I used HTML Match to create a comparison of the two documents (dated October 14, 2005 and July 1, 2004) and posted a screen cap of the differences here.

Aaron Swartz's useful web-based HTML Diff web-based program also offers a comparison of the old and the new and is available here.

Of course, if you would like to review the actual documents:

A new privacy "highlights" document is also available along with a Google Privacy Policy FAQ. Policies for other Google services are linked in the left column of the FAQ. Most of these documents are dated October 14th.

Posted by Gary Price at 2:11 AM | Permalink

October 3, 2005

Keeping Yourself Out of Web And Other Databases

Ann Harrison's, Wired News article: 'UnGoogleables' Hide From Search, offers a profile of Geri Agalia (not her real name), a person who values her privacy and is trying to keep info about herself out of the Google database. We've seen stories like this before.

Allow me share a few comments:

From the article: "More people are finding they're leaving an accidental trail of digital bread crumbs on the web -- where Google's merciless crawlers vacuum them up and regurgitate them for anyone who cares to type in a name.

Look, I can be as tough on Google as anybody, but always pinning these issues on Google is well, totally unfair. Many other crawlers are out there and accessing much of the same data as Google is. The person who wants/needs/desired to get to this info (often much more personal than what club you might belong to) will know to look elsewhere. Plus, trying to keep material only out of Google (I see lots of these stories) does not keep it out of other databases. Kudos to Ann Harrison for mentioning this at the end of her article but it should be an issue of "web search engines" not just Google.

Philadelphia real estate investor Victor Lindt says he's surprised his name doesn't show up on Google, especially since he once owned a well-known pastry shop that was covered by the local and national press.

If someone had the desire/want/need to learn more about Mr. Lindt they could look many other places besides Google like invisible or deep web databases. Lots of public record databases remain on the deep web. For example: if you've been involved in a U.S. Federal Court case, court dockets and filings might be available via the PACER database.

No doubt about it, web search makes things easier to find but the person who wants personal info is likely to have a database toolkit with hundreds if not thousands of free and fee-based tools to LEGALLY find what they're looking for. Now, you're likely saying the typical "troublemaker" doesn't have the time to check thousands of public record files. However, fee-based services like KnowX and Intelius aggregate lots of this material.

Finally, I regularly get email or read list postings about people outraged with Google's (not any other) online phone number lookup database. They want to know how to get their name and number out of the system. Here's the page. However, as Google correctly points out, simply getting out of Google phone database doesn't mean that it will be removed from others.

Trying to remain completely and totally private in the United States might be possible. Very difficult, but I guess possible. Laws in other countries make this a completely different issue. However, saying that some of the problem is Google's fault is ridiculous. However, it may be understandable given Google's prominence and a general lack of understanding on how large open-web engines work.

Note to search engines (not only Google): As a public service and to aid some privacy concerns, why not spend more resources (beyond a page buried on your site) teaching people, especially webmasters, how to keep material out of your databases. Keeping content off the web is becoming a frequent question we get asked at training sessions. I guess this also points out the fact that many people don't understand how web crawlers work. Btw, just placing this removal info on your blog will do little to no good. Why? These people are likely not to read your blog.

Posted by Gary Price at 3:21 PM | Permalink

September 27, 2005

TrustWatch & MSN Offer Anti-Phishing Tools To Searchers & Surfers

TrustWatch is a new Ask Jeeves-powered search engine designed to give you a green, yellow or red light warning on whether to trust pages listed in its results. It follows on the release of an anti-phishing add-on for users of the MSN Search toolbar.

At TrustWatch, the warnings are to help you know if you are reaching a fake site or one that's "phishing" for you to reveal personal information.

For example, imagine you were trying to reach the Bank Of America site. It's possible that someone might create a site that looks like the real BofA site and ranks well for a search on the company's name. A good search engine shouldn't let this happen, but it still can occur. Even more likely, it can happen if you search using a slight misspelling.

TrustWatch places colored rating icons next to each listing. Green means the listing has been verified as real and trustworthy by a third party. Yellow means there's been no verification, but neither has the site been reported on a blacklist. Red means someone has reported a site as disreputable and that you shouldn't trust it.

Run a web site and want to be trusted? GeoTrust, the company behind TrustWatch, will conveniently sell you a site identity seal for $49 per year. You can also get a trust rating from one of the other companies that it lists, including TRUSTe. I wish the page TrustWatch lists with these organizations made it exceptionally clear exactly which products each of these companies are selling are acceptable, especially what the lowest cost options are.

I can understand that site owners probably should pay to be rated. Someone's got to do the reviewing. But it shouldn't be super expensive. Plus, non-profits and governmental groups should get a break. Of course, I see the US White House site is considered trusted, and I'm betting they didn't pay for a review.

Want to know if something is trustworthy as you surf the web? There's a TrustWatch toolbar you can install that lights up to let you know if a site is trusted when you visit it.

That brings me over to news from earlier this month. Microsoft has a Phishing Filter Add-In for its MSN Search Toolbar. Like TrustWatch's, it's only for Internet Explorer, unfortunately. It will block sites that are on known phishing lists and warn you of sites that it suspect may be phishing based on scanning for common characteristics.

Having these features in toolbars is great, of course. In fact, I'm guessing we'll see Ask Jeeves down the line add TrustWatch-powered warnings to its toolbar since it's partnering to provide TrustWatch with search results. But it would be nice to see anti-phishing warnings in the results of the major search engines, as well.

I mean, the Ask Jeeves blog today is what alerted me and others to the new TrustWatch service. Rather than have Ask Jeeves point me elsewhere, I obviously want them to put these features into their own search results. Same, too, with MSN. Give phishing warnings in the search results, as well as in the toolbar. And let's see Google and Yahoo do the same.

Want to discuss? Visit our forum thread, Ask.com Powers TrustWatch - GeoTrust's Secure Engine.

Posted by Danny Sullivan at 10:26 AM | Permalink

September 9, 2005

Yahoo Says It Must Follow Chinese Laws On Giving Info

Yahoo says it must abide China law from Reuters has Yahoo neither confirming or denying it provided email details that helped Chinese authorities jail a journalists, as we've covered earlier. However, the company did say that it has to operate within the laws of the countries where it operates. And spotted via Dan Gillmor, Rebecca MacKinnon notes that if Yahoo hosted its email servers outside China, it might not have comply with Chinese laws: Yahoo! e-mail in China: must be evil to be legal.

Postscript: Yahoo Founder Explains China E-Mail Move from the AP has Yahoo cofounder Jerry Yang saying at a forum in China that the demand was a "legal order" that Yahoo had to comply with.

Posted by Danny Sullivan at 12:01 PM | Permalink

September 7, 2005

Ask Jeeves CEO Interviewed, Says Online Scams Could Slow Adoption of Personalized Search Tools

The vnunet.com article: Scams hold back personalised search, discusses how phishing and identity theft could cause problems in the adoption of personalized search tools. The article includes several comments from Ask Jeeves CEO, Steve Berkowitz.

"The benefits to the user will be incredible, but whether the user will allow those benefits to happen over time is something that we have to wait and see," he [Berkowitz] said..."It's going to take a lot of time before users are going to trust putting a lot of information into the computer, when there are scams that go on and you keep reading about credit card and identity theft," said Berkowitz. "The user will benefit from the ability of the technologies to understand more about them to make search more relevant and to make information retrieval more relevant. But it will take time and probably more than people anticipate."

Berkowitz's comments come from a two-page Q&A interview that was also published today. It's titled: Ask Jeeves warms up search battle. In the interview, Berkowitz says that Ask Jeeves doesn't have a brand awareness problem.

I would say that Ask Jeeves doesn?t have name recognition problem. We have 80 plus per cent aided brand awareness and probably 30 per cent to 40 per cent unaided brand awareness at this point in time.

Our growth potential is greater than anybody else?s, because people know us. They just don?t know why we'd like them to get to us. Which is: come to Ask and have any of your questions answered, but also it is a great place to search.

While AJ might have "80 plus per cent" brand awareness I wonder just what the people who are aware of the brand think of the brand. In other words, many people do know and recognize the name Ask Jeeves but it's been my experience, as recently as last week, that Ask Jeeves means "poor search engine" to many people. If I had got a dollar every time someone told me that eThese people need to see that Ask Jeeves 2005 is not the same search engine that it was in 1999 or 2000, I might be able to retire. Getting people to try and regularly use something new or different is a challenge all by itself but AJ's challenge is even more complex since they also need to demonstrate and show those who are already aware of the AJ brand that they've improved (understatement) their search service.

Posted by Gary Price at 12:34 PM | Permalink

August 24, 2005

Identity Theft & Issues For Search Engines

Search Engines Find Stolen Identities from Information Week dares to go into the issue of how search engines can reveal personal details about people and perhaps aid in identity theft, leading off with how a Google search can bring up social security numbers, credit card details, bank accounts and so on. Fair play, it also makes it amply clear that other search engines can be used this way as well.

It's familiar territory that other articles have covered before. This article does go into some new areas of wondering if search engines are liable for showing the info (doesn't seems so) and if perhaps they should take a more active role, given that they do censor for spamming and legal reasons.

In the end, it remains an issue that if it's on the web, the search engines are likely to find it. It's difficult to know when they should step in and pull listings, especially when dealing with public records.

On the other hand, the issue continues to be a concern over all these years and even at the highest level, as Google CEO Eric Schmidt found out when News.com used a few examples of what you can learn about him on Google to illustrate a similar story on this topic earlier this month. As a result, News.com got slapped with a "we won't talk to you for a year" ban by Google.

The ban of course, doesn't solve the problem that Schmidt's personal info is still available on the web and finable through Google. See Google Blacklists News.com over in our forums for further extended comments from me on the issue -- along with a quote from back in 2003 when Google cofounder Sergey Brin was asked about these types of issue and himself was uncomfortable that some might use Google to learn of his home address.

Posted by Danny Sullivan at 12:28 PM | Permalink

July 14, 2005

Moving Past Google Privacy Fears & Toward An Industry Solution

Google's balancing act from News.com revisits the well-trod path of Google as potential privacy threat. Personally, I would love to get beyond these "what Google might" do stories and more toward what the search engine industry itself ought to be doing in terms of protecting privacy, especially as everyone's offering personalized search or search history features. Your comments will help, as I'll explain below.

First, the story. What's new? More people use Google for a wider range of things, and Google Accounts make it easier to track them. So privacy advocates are alarmed about Google once again.

What Google does and knows pales beside the much more detailed information Yahoo knows about registered users, who also do a wide range of things with Yahoo. The story itself even explains that Yahoo gathers much more personal identifiable information than Google, yet it's Google that gets the headline.

So as usual, it's a "Google could" thing rather than what we really need, a look at stuff actually happening.

Yesterday, Yahoo's chief data officer spoke about how Yahoo's Impulse system will show you graphical ads based on what you searched for over the past 48 hours. No red flags from anyone about that?

I wrote about a different version of this program in 2003, in my Search Privacy At Google & Other Search Engines article. I looked at how Yahoo sort of brought it out to target you via email, then dropped plans for undisclosed reasons. Suspicion as to why it was dropped? Privacy worries. You're going to profile my searches?

Now here we are in 2005, Yahoo Impulse is back with profiling of a different nature, and it's nary a word I've heard from anyone on the privacy front.

Hmm, well Yahoo did just say this yesterday. No, they've been saying this for some time. For example, CEO Terry Semel made a big deal about tapping into personal data to make more money off of search at the end of May:

Semel also said that Yahoo has embarked on a large project to make better use of the huge stores of data it has about Web users to help advertisers better target their messages. The result will be good for both consumers and advertisers, he argued, because consumers won't be "bothered" as often with offers they're not interested in and will get more offers that do interest them.

When I saw that, I wrote as part of my review of the article:

If Google said something like that, pitchforks would be out, privacy advocates enraged and the question of whether Google was really secretly evil would hit the blogosphere. Yahoo said this last Thursday, and so far, nada.

That was the end of May. Here we are in the middle of July, and I've still heard squat in terms of concerns. If any of the privacy analysts in the News.com story had them, those certainly didn't make it into the story. I suspect they didn't really raise them. It's so much easier to just fixate on Google. Google, Google, Google. The Marcia Brady Of Search gets all the attention, as usual, but the others of the Search Engine Bunch won't mind that, on this particular issue.

Hey, what about Amazon? It's been offering personalized search for over a year. Amazon's got a lot of users. Sure, not really using A9 -- but still potentially touched by this. But no one's really freaked out over Amazon offering A9 on the privacy side. This is despite Amazon actually having had to settle in a privacy case back in 2001. Amazon did, and it's silence. Google could -- but hasn't -- and it's headline time.

Orkut a threat? Gmail a threat? Yeah, these services could add to Google's knowledge about people. They're also in closed, invite-only betas. They represent potential. In reality, right now there are social networking and email systems that Google's rivals -- themselves much used -- already offer. But Google gets to be the "lightning rod."

Oh, I could go on and on. I have. Here's some past reading:

Since I'm not seeing the article I want -- the moving past what might happen and more toward what should happen -- I'm going to revisit the issue myself. You can help. Please give me your comments on what you think search engines should do on the privacy front. What are you afraid of? How long should they keep data? And what laws should protect you?

There are real concerns. I'm not dismissing these at all. There's potential for both corporate and governmental abuse of search profiles. But what we need is less hype, less putting one player in a corner and more actual suggestions of things that everyone can implement. Please contribute in our forum thread, How Should Search Engines Protect Privacy?

Postscript: Earlier I had written that the new version of Yahoo Impulse mail targeted emails, as was reported in the DM News article I referenced. Yahoo said that article was incorrect and that graphical ads are delivered.

Posted by Danny Sullivan at 8:37 AM | Permalink

July 1, 2005

Amazon Sued Over Copyright Infringment In Google-Powered A9 Image Search

Amazon Sued for Copyright Infringement from the Associated Press covers how Amazon is being sued by Perfect 10, an adult magazine and web site, over Perfect 10 images apparently appearing in Amazon-owned A9 image search results. Those results are powered by Google, which we've previously blogged about being sued by Perfect 10 back in November.

Posted by Gary Price at 9:22 AM | Permalink

June 21, 2005

AlmondNet Hires Privacy Consultants

Behavioral Targeting Firm Hires Privacy Consultant from MediaPost covers how AlmondNet -- which offers a unique service targeting ads based on your search profile -- has hired a consultant firm to review and perhaps change privacy strategy. Though the company says it has had no complaints so far, it wants to ensure things are "privacy-friendly." More about AlmondNet in New Search Behavioral Network Launched and Lycos To Distribute Ads Through AlmondNet Post-Search Network from our blog.

Posted by Danny Sullivan at 10:36 AM | Permalink

June 6, 2005

Balancing Search Privacy & Data Storage

Google's long memory stirs privacy concerns out today from Reuters should have been called "Search's long memory stirs privacy concerns," but as usually, everything has to be a Google issue, even the bad stuff that isn't Google specific. From the article:

Like many other online businesses, Google tracks how its search engine and other services are used, and who uses them. Unlike many other businesses, Google holds onto that information for years.

Well, maybe not unlike other businesses. We're told that ISPs generally don't hold data for longer than a month. But how about companies that mine that ISP data for various purposes such as Claria, Hitwise, comScore? They definitely keep some of it much longer.

Moreover, it's fair to say most people have no idea that their ISPs might be giving access to aggregate surfing activities that potentially show everything an individual does on the web, rather than just what they searched for.

But nah, let's focus on Google as the main element of concern. After all, they aren't getting rid of all that data! Then again, Yahoo "declined to say" how long it keeps data either. But that's not in the headline. It isn't, "Yahoo's long memory stirs privacy concerns." After all, it's just Yahoo. Who cares about Yahoo? It's not like they have anywhere from 20 to 30 percent of the search market in the US, depending on which metrics you use. That apparently isn't signficant enough.

The headline isn't, "Amazon's long memory stirs privacy concerns." But why not? Amazon's A9 lets you keep a search history and even gave you discounts on Amazon purchases for your A9 searches. They could do that because they know who you are at A9, how often you've been searching plus who you are at Amazon. But lets beat up on Google in the headline, instead. After all, the threat of what Google might do is much more fun that the reality that Amazon has already paid out over a privacy dispute before.

I talked with the reporter for this story (no relation to me, despite the same last name) and raised these type of issues. Kudos that he did ask around on the ISP data and what some others were doing. But in the end, it still ended up a Google article.

I'm quoted at the end saying you don't want Google to throw away all the queries that have been done. No, you definitely do not. That's history, folks. I want to know how people searched over time, and we'll regret not having this data if we simply discard it. That doesn't meant you have to keep everything, but there are definitely good reasons to keep some types of information for more than a few months, unlike the views of someone else in the story.

More on why and related issues are covered in my Google And The Big Brother Nomination article from 2003. See also Better Search Privacy Needs Addressing Overall for more reading on the subject. Yes, we need better privacy disclosure and reassurance for consumers. We may even need better laws protecting us from when the inevitable changes happen within corporations. We don't need a dogpile on one company over the issues.

Posted by Danny Sullivan at 5:59 PM | Permalink

May 31, 2005

Yahoo CEO: 2005 Is For Making More Search Revenue & Better Ad Targeting Through Personal Info

Yahoo CEO: Better Search Monetization Is Top Co Priority from Dow Jones has Yahoo's Terry Semel saying that increasing the revenue earned per search is the company's "largest and most important project" in the search space. Now that the core search effort of 2004 is apparently considered overcome, cranking up the revenue is what the tech teams are focused on this year. Not exactly reassuring words for searchers, who might hope that the most important project remains a focus on relevancy. But, the comments did come at a conference aimed at investing types.

Meanwhile, Yahoo is going to tap into the tons of data is has about users to better target ads. If Google said something like that, pitchforks would be out, privacy advocates enraged and the question of whether Google was really secretly evil would hit the blogosphere. Yahoo said this last Thursday, and so far, nada.

Everyone, of course, is going to do more targeting of ads (and search results) based on personal data. It's even a good thing, in many ways. But it would be nice to see the industry perhaps look more at how to prepare consumers for this coming. A bit more on this in this past post, Better Search Privacy Needs Addressing Overall.

Posted by Danny Sullivan at 8:10 AM | Permalink

May 6, 2005

Google Web Accelerator Raises Worries

The new Google Web Accelerator released earlier this week is raising concerns about data privacy and webmaster issues.

Much Controversy Over Google's Accelerator from Nathan over at Inside Google looks at how the Something Awful forums found that the tool seems to have cached forum pages personalized for a particular user. In other words, those using the software came into the site as if they were logged in as someone else. If true, that's pretty worrisome.

Inside Google also raises the specter of how the software is helping Google keep a record of what everyone does, which it might datamine in various ways. Sure, that's a valid fear. But Google hardly needs Web Accelerator to do it. It already has millions of people using its Google Toolbar. For years, the Google Toolbar has given Google records of what people are looking at all over the web. So monitoring what people do on the web isn't anything new, for Google.

The article touches on issues of how the accelerator might injure site stats, providing some links to disabling it if you are a webmaster. Nathan also suggests that people won't do this, because Google will probably use accelerator data to help rank sites. Ban accelerator, and you'll ban what Google knows about your site -- and potentially then lose rankings.

I wouldn't worry about that at all. Sites have already banned Google from caching their pages and still done well despite this potential big red flag. Don't want accelerator caching your site? Go ahead and ban it.

Nathan's had further posts touching on other issues:

Google Blogoscoped highlights another issue in Google Accelerator Deleting by Prefetching, while Threadwatch points to Fantomaster's How To Block Google?s Web Accelerator page.

Want to discuss? Visit our forum threads:

Postscript: News.com's FAQ: Hard facts about Google's Web Accelerator does a Q&A on some of the issues involved with the software.

 

Posted by Danny Sullivan at 12:27 PM | Permalink

April 21, 2005

Comparing Search History Features

With Google having released its new Google My Search History feature yesterday, I wanted to spin back around and look at where we stand in terms of search history offerings across a number of major search engines. I've done so in chart format below.

Before diving into the chart, let me stress that this isn't a "have the most features and win" contest. Some features you might not ever use. What search history features seems to work best, like the search engines themselves, may fall to your own personal decision.

Even among the editors here at Search Engine Watch, we all love different things. Personally, I find the A9 and Google tools the most compelling, because they automatically save what I'm looking for. I think it's cool that Ask, Yahoo, and A9 have categorization and annotation features in various manners, but those aren't something I expect to use myself. Others may -- and that's why it's great that they are offered.

Chris and Gary are very much into tools that save the full-text of documents and let you search against them. I'm leaving it to them to them to do a separate recap on how tools stand on that front. Gary's also playing with the Filangy, which is a closed beta, and reviewed it yesterday here. It's not on the chart below because being a closed beta, it's not something everyone can use yet.

Personally, I've loved the Google Desktop as a way to keep track of everything I've seen exactly as I saw it when visiting pages across the web. It's largely solved my own search history desires at Google, as I've written before. But the additional features from Google are definitely welcomed.

On to the chart! A guide to categories follow it below.

Feature

A9

Ask

Eurek ster

Find ory

Furl

Google

Yahoo

Auto Save

Yes

No

Yes

Yes

No

Yes

No

Pause

No

n/a

No

No

n/a

Yes

n/a

History Search

Yes

Yes

Yes

No

No

Yes

Yes

Date Sort

Yes

Yes

No

No

Yes

Yes

Yes

Term Sort

No

Yes

No

No

No

No

Yes

Site Sort

Yes

No

No

No

No

No

Yes

Notes

No

Yes

No

No

Yes

No

Yes

Tags

No

Yes

No

No

Yes

No

No

Folders

No

Yes

No

No

Yes

No

Yes

Launch

4/04

9/04

1/04

11/04

2003

4/05

10/04

Auto Save: Means that your searches are automatically saved. My Yahoo Search does have a Visited Results feature that's supposed to be able to do this, but I found it's not working for me in either Internet Explorer or Firefox. So I've marked it as No, for the moment.

Pause: If searches are automatically saved, this means that you can temporarily pause saving. If pause isn't offered, you have to sign-out of the system to prevent saving.

History Search: Means that you can do a search just within the things you've searched for previously. For example, if you knew you looked for something related to "cars" but didn't know exactly how you searched, you could search for "cars" and find all the queries containing that word. In some cases, a history search may also search against the content of the web page or notes and annotations you've made.

Date Sort: Means that you can sort your history by date in some manner. The degree and flexibility of which may vary.

Term Sort: Means that you can sort your search history by term (the title of the search), in alphabetical or reverse-alphabetical order.

Site Sort: Means that you view your search history by seeing it listed in order of sites you clicked on.

Notes: Means that you can annotate things you've found in your search history with comments. At A9, these notes aren't stored in your search history, so I've marked this as No. However, annotation of sites you've visited can be done using the diary feature, if you use the A9 toolbar. More info here.

Tags: Means that you can annotate items in your search history into categories by tagging them with keywords.

Folders: Means that you can organize your search history into folders, such as if you want to group certain queries into a particular subject heading.

Launch: When the search history feature was launched.

Other Notes: All the services give you the ability to delete what you've searched for in some way, so I've not made that a column on the chart. In addition, using toolbars or desktop software, you can extend the functionality of search history features, in some cases.

Looking for more background? Here are some past reviews of each tool from Search Engine Watch and some related stories:

Search history tools also raise privacy issues, so here are some past stories to consider reading:

Posted by Danny Sullivan at 7:06 AM | Permalink

April 5, 2005

Better Search Privacy Needs Addressing Overall

What Search Sites Know About You from Wired looks at the issue of search privacy. As usual, Google gets to be the whipping boy of concern when Yahoo and MSN, among others, should also be on stage for any potential criticism.

Newsflash, Chris Hoofnagle of the Electronic Privacy Information Center. You're apparently particularly wary of Google, because many search there, use email on Google and and make use of its social networking service Orkut. Then be just as particularly wary of Yahoo (and apologies if you were as well, and that didn't make it into this Wired article). It's also incredibly popular, with people searching, reading email, social networking -- and many doing this all while also signed in with personally identifiable data.

Indeed, someone's search history is one of the factors Yahoo already uses when it decides to behaviorally target non-textual ads on the Yahoo site. Google's yet to actually demonstrate any use of a search profile. Yahoo is and has been already doing it, but it's Google that gets painted as having the "finger" on all our data.

Privacy concerns are valid -- let's just point them at the industry as a whole, rather then pulling out one bogeyman to beat up on. People need to have better awareness that what they search for on major search engines could be used to profile them. Search policies need to better address what exactly may be done with search queries in particular. And search engines may need to add their collective weight to push for ISPs not to share data.

Go ahead -- turn off your cookies on Google, don't sign-in to Yahoo and even use some anonymous surfing service such as Anonymizer. Make sure you run your own ISP as well. Otherwise, your ISP has a record of everything you are doing on the web, including searches -- and that data is already being shared for search profiling.

There are real pluses to having search profiles. Potentially, they can improve search results. Potentially, they can improve the ads we see. Profiling is already here to some degree and will only grow. Good protections of our search privacy needs to grow alongside this, on search engines and across the internet.

Some past reading on this subject:

Posted by Danny Sullivan at 8:51 AM | Permalink

March 30, 2005

Can The Cookie Be Saved?

Crumbling Cookies Threaten SEM and Online Advertising from Kevin Lee at ClickZ looks at how a growing number of consumers eschewing cookies may make tracking conversions more difficult, thus impacting measuring search marketing effectiveness. He urges for various organizations to band together to save the cookie through education.

Want to discuss? Visit our forum thread, Protecting Cookies from Deletion.

Posted by Danny Sullivan at 7:56 AM | Permalink

March 21, 2005

People Search: Eliyon Becomes Zoominfo.com

People search company Eliyon is now ZoomInfo.com. They have a new search interface and have launched three new advanced search options. The service is well worth a look, but be aware that the technology still needs improvement.

ZoomInfo, under the old Eliyon name, has offered both free and fee-based people searching for about four years. Here are reviews from 2001 and 2002.

More recently, I've blogged about them several times including A Look at a Few Boston Area Search Companies and this post, Business.com Adds People, where I point out that Zoominfo/Eliyon provides the same data to Business.com and Lycos People Search. They also offer their database on the HighBeam Research site.

Today, Forever Famous from Newsweek looks at the relaunched service and examines some of its features, such as the ability to edit your profile, as well as some of the privacy issues raised.

Postscript: See also Eliyon Renamed Zoom Information with New Consumer-Oriented Strategy to Match from Information Today for an update on the service.

Posted by Gary Price at 9:22 PM | Permalink

March 15, 2005

Claria Debuts RelevancyRank: Search Ranking By Behavioral Activity

Claria, the company behind the Gator eWallet software, has released new search relevancy ratings today examining how the top search listings on Google, MSN and Yahoo compare to pages the company says its research shows are actually most relevant. More important, the ratings mark the first use of technology Claria hopes will let it improve the results of major search engines or perhaps offer its own improved search engine.

You'll find the ratings in this company press release, and I examine them more in the Claria Unveils Behaviorial-Based Search Ranking article now posted for Search Engine Watch members. In short, this isn't a battery of tests that you can take to the bank to know who is best.

Instead, it's really meant to showcase the bigger point Claria wants to make. It's now going public with its RelevancyRank system that uses behavioral data to determine what it believes are the best pages on the web for any particular term.

Claria computes this by both monitoring the activities of web surfers and searchers through its own software applications and with partnerships it has with publishers. The company's plan is that the technology will either be licensed to search providers looking to use its data or it may release its own search engine powered by clicktracking and behavioral data itself.

More on this "third generation" of clicktracking in this article for SEW members, Claria Unveils Behaviorial-Based Search Ranking.

Posted by Danny Sullivan at 3:43 PM | Permalink

February 23, 2005

Google's Blogger and Unwanted Software

An eWeek article: Spyware Snags Blogger Users, reports on a new study by Harvard researcher, Benjamin Edelman that says that "dozens" of blogs hosted by Google's Blogger/Blogspot trick visitors and install spyware and adware onto visitors' computers.

The offending blogs typically prompt visitors to accept downloads through misleading pop-up windows, said Ben Edelman, a vocal spyware critic and Harvard University researcher. While a user typically must accept the download before the software installs, the prompts often attempt to trick users by disguising the download as a necessary Windows or Internet Explorer upgrade.

You can read the complete study: How Google's Blogspot Helps Spread Unwanted Software, here.

Posted by Gary Price at 4:13 PM | Permalink

February 22, 2005

Your Phone Number On Google & Other Search Engines

Yes, Google has (as others do) a phone number search service, as we've written before. Still, it can be a shock to some people if they discover their number is "listed" on Google. Is It Too Easy To Find People On Google? looks at the issues some may have with this type of service at Google and the many other companies that offer it. But in the end, the issue is really with data providers who collect this information from public records, as the story explains.

Posted by Danny Sullivan at 1:26 PM | Permalink

November 5, 2004

Google "Hack" Prevention

CNETAsia's: Are you Google hack-proof?, offers several suggestions about how to keep material out of web engines.

The author concludes the article with an important reminder since most articles on the topic focus on "hacking" as only an issue with Google. It isn't!

And don't forget also that there are other search engines that offers search-features that are different--in some cases better--than Google.

Posted by Gary Price at 8:44 AM | Permalink | Comments (0)

October 17, 2004

More Google Desktop Privacy Worries & A Microsofty's Fear

Another Google Desktop security issue to consider. I was forwarded a note of shock about someone who used Google Desktop and found it was caching his banking details (the same person, it should be said, failed to eliminate those details in the screenshot example illustrating this privacy concern).

Yep, GDS will do this, IF you allow it. Google obviously understood this concern, because it gives you an option to disable caching secure web pages.

Eric Baillargeon illustrates the option in his Google Desktop : Security Warning post. Details directly from Google are here: How can I keep private information out of my results?

Meanwhile, Google Desktop-rival Copernic is touting the privacy angle as a weakness. Google Desktop privacy branded 'unacceptable' from The Register has quotes from that Copernic's CEO David Burns.

David's "stick your hand up if you want Google to know what pictures you have, and what MP3 files you have," quote is a bit extreme. So far, there's no indication the tool is going to report back to Google about what data is stored on your desktop (and its image/MP3 search capabilities in particular are rudimentary, to say the least)

David sent me a similar comment in an email discussion we've had on the issue. He was more general in that, suggesting people would be freaked out if you told them their "private content search keywords" would be sent out over the public internet.

My response was that people kind of do this already. They search for lots of things that are very private via the public web. It's just that many of them don't realize this.

Should they be worried? Aware, yes -- but probably not too worried. My More On Google & Other Desktop Search Stuff expands on this more. See also my A Closer Look At Privacy & Desktop Search and my Search Privacy At Google & Other Search Engines articles.

Meanwhile, the headline of a new San Jose Mercury News article should make Gmail is too creepy author Daniel Brandt smile.

The mainstream Merc writes in Google's Desktop Search is valuable, yet creepy about how the tool keeps track of IM chats some might think are private, how things deleted on your computer are still retained in Google Desktop and how, as mentioned, secure pages can be indexed.

By the way, the version of our Google Desktop review for Search Engine Watch members goes into depth out the issue with deletions still being retained and how things like clearing your internet cache do NOT clear a record of your browsing in Google Desktop.

Finally, another interesting angle on the Google Desktop launch. Uh-oh, it's google spotted via Dirson has someone who apparently works for Microsoft worrying that he's getting so tied to Google that "Google is kicking our butt."

Want to discuss or comment on this post? Visit our Google Launches a Desktop Search Tool forum thread.

Posted by Danny Sullivan at 10:03 AM | Permalink | Comments (0)

October 14, 2004

Job Recruiters & Searching Your Shadow Resume

Recruiters Use Google To Screen Job Applicants from the Wall Street Journal looks at how web searching could hurt the chances of some seeking jobs. In short, the web is turning into a "shadow resume" that potential employers can easily tap into. Some tips at the end on what to do if your shadow resume isn't up to snuff. Thanks for the link found via InsideGoogle!

Posted by Danny Sullivan at 7:37 AM | Permalink | Comments (0)

October 8, 2004

Where's The Privacy Freak Out Over Search Personalization?

Alan Chapell raises a good question in his Amazon.com's A9 Adventure article. Why haven't privacy advocates freaked out loudly and in large numbers about a9 and its personal search features? Only a few months ago, privacy concerns over Google's Gmail made headlines.

I agree with one of his Chapell's main arguments, that Amazon's built a reputation of trust with many of its users for handling personal information well (though Amazon subsidary Alexa did agree to settle a privacy dispute back in 2001).

Perhaps part of the reason the a9 launch (and other personal search features debuting since then from Ask Jeeves and Yahoo) haven't raised more ire is because search privacy was raised as an issue last year.

Perhaps. I think the real answer is that people still aren't largely thinking much about search privacy. Gmail wasn't a search privacy issue. It was an email privacy one. I think people know inherently that there are private things sent via email. The idea that that email is going to be analyzed to show ads -- even through an automated process -- sounds scary.

Another reason may be that Amazon's a9 simply isn't used by that many people. In contrast, Google is by various metrics still the most popular search engine. What it does in terms of search impacts many more people.

In fact, that's one reason why Google got singled out for privacy concerns back in 2002 and 2003 relating directly to search. It was so big that privacy advocates figured it deserved the most attention.

In contrast, I think search is seen largely as a transient thing to many people. They really don't stop to think much about the very personal things they look up. They've also had no real experience with this information actively being recorded in a way they can use, unlike with email.

Last year, I looked at search privacy issues in my Search Privacy At Google & Other Search Engines article. I explained then how search was largely ignored as a privacy concern compared to things like cookies because search features themselves had no "memory" to them. Now that search memory tools have arrived big time, I'm sure we'll see search privacy concerns grow as such tools become used.

Posted by Danny Sullivan at 10:17 AM | Permalink | Comments (0)

September 15, 2004

Gmail Invites as Phishing Bait

Scammers use Gmail invite as phishing hook Source: News.com

Looks like some scam artists are using the allure of Gmail as bait in a phishing scheme.

Posted by Gary Price at 5:49 PM | Permalink | Comments (0)

September 12, 2004

Survey: Consumers Want More Personalized Online Ads, But Don't Want Identities Known

Survey: Survey: Consumers Want More Personalized Online Ads, But Don't Want Identities Known Source: Media Post A study from the Ponemon Institute reports that personalized ads are fine with consumers only if advertisers do not gather enough info to personally identify them.

Posted by Gary Price at 4:30 PM | Permalink | Comments (0)

See More Posts From:

This Week | This Month

  var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA-564586-7"); pageTracker._setDomainName(".searchenginewatch.com"); pageTracker._trackPageview(); window.collarity_appid = "incmedia"; //> //>

Account Manager
Varick Media Management New York, United States

Reporting and Data Analyst
Varick Media Management New York, United States

Director of Marketing Communications
Avery Dennison Brea, United States

Publisher
Confidential Leading Publisher New York, United States


0