SES Chicago - December 7-11, 2009

August 20, 2009

The Number One Google Killer: Google

Have you noticed lately that Google wants to keep connecting all of your logins on various products together? You're not alone. Have you wanted to keep them separate but Google won't let you? You're still not alone.

I've been talking extensively lately with my former boss, in-house SEM Al Scilitani about this very problem. It seems that Google keeps forcing him to combine all of his Google accounts together. That means he would need to use the same login for Google AdWords, which he needs for professional means, as he would to access his portfolio on Google Finance.

Google tells him that if he wants to keep them separate, he needs to create a separate email account. I tell Al not to trust that. Here's why:

Recently, I created a new YouTube account. I wanted to have two: one personal and one professional. I used separate Google email accounts. But every time I sign into the professional one, it automatically redirects me to the personal account. Not cool.

Everyone likes to speculate about whether Facebook or Bing or Twitter is the Google killer. But with actions like these, who needs competition? If Google keeps messing with our trust and privacy, they might turn out to be their own worst enemy.

Posted by Nathania Johnson at 2:04 PM | Permalink | Comments (11)

June 18, 2009

Privacy Roundup: Google Street View, French Law, and Congressional Hearings

Privacy concerns are ever-present, but the past week has generated significant buzz on the matter.

First up, Google has been asked by a the European Article 29 Working Party to keep "unblurred" photos for Street View for as little time as possible. The conundrum is this. Google sends its cars out to photograph countries. They use technology to blur things like license plates. However, sometimes the tech goes awry and blurs things that don't need blurring. When Google finds out about it, they use the original, unblurred photo to correct and then add it back into the system. Clearly, that raises a privacy issue. Google says it's working with the Article 29 Working Party to determine the amount of time they should keep the photos, but no solid timeframe has been given yet.

Speaking of Europe, France is putting the pressure on social networking sites like Facebook when it comes to privacy issues. The matter at hand is the trend towards being "open" so that third-party developers can build applications using APIs. French politicians are concerned about these third parties gaining access to private information. There are two things the politicians should consider. One is that most APIs are restrictive. In other words, you don't get access to all of a social network's functionality just because there's an API. Secondly, most of the networks provide an option for users to opt-out of their information being shared.

Last but not least, the United States Congress is having yet another round of hearings on web advertising and privacy. Yahoo! Vice President of Policy and Head of Privacy Anne Toth today testified at the House Energy & Commerce Committee subcommittee's "Behavioral Advertising: Industry Practices and Consumer Expectations." Toth explained the benefits of relevant advertising but also touted the Yahoo! Privacy Center.

Google Deputy General Counsel Nicole Wong also appeared at the hearing. Wong spoke about Google's recent launch of interest-based advertising. The benefits of relevancy of the ads was a talking point for Wong, as well.

Posted by Nathania Johnson at 3:59 PM | Permalink | Comments (0)

January 28, 2009

Data Privacy Day Exhibit Differences in Approach from Google and Yahoo

When it comes to privacy - and overall service/product offering - there is a big difference in how Google and Yahoo relate to searchers and customers. Google often strikes me as user-centered while Yahoo should seriously consider checking out Bryan Eisenberg's We-We.

Today is Data Privacy Day and the way Google and Yahoo are commemorating the awareness campaign brings those differences to light even more.

With Yahoo, it seems like they're for the user. They blogged a bunch of tips that users can do to protect their own data. But Google inserts a different tone: What *they* are doing for users.

Sure, Google shares tips for users, too, but they mix it with showing the great lengths they are going to, to ensure privacy on their end.

Yahoo's tone, of course, is symptomatic of a greater problem of not putting the user first. They're lashing out at bloggers for their own rule change in their search marketing service and overestimated themselves in the past year during the Microsoft acquisition attempt.

Of course, there's a new sheriff in town in Carol Bartz, the new CEO. Let's hope for Yahoo's sake that she can create a new tone and remind the purple people that the success of her company depends on putting the user first.

Posted by Nathania Johnson at 12:51 PM | Permalink | Comments (0)

July 7, 2008

Google Adds Privacy Link in Wake of Viacom Ruling; YouTube Addresses Privacy Issues

Recently, Google has been resisting calls to add a privacy link to their home page, saying searchers can simply type "Google privacy policy" in the search box to find the info. Plus, they didn't want to mess up that beautiful front page - well, except for links to advertising and business solutions that will bring them money.

But the search giant has finally caved and added the 7 letter word to its page with a link to the policy. And as John Paczkowski points out at AllThingsD, the link just happened to go up just after a judge ruled that Google has to hand over YouTube user logs in a suit brought against it by Viacom.

Meanwhile, YouTube addressed the ruling on its blog. While they're planning on complying with the ruling, they are working with Viacom lawyers to remove at least some of the information they'll be handing over: Of course, we have to follow legal process. But since IP addresses and usernames aren't necessary to determine general viewing practices, our lawyers have asked their lawyers to let us remove that information before we hand over the data they're seeking. (You should know, IP addresses identify a computer, not the person using it. It's not possible to determine your identity solely based on your IP address. Rather, an IP address can reveal what geographic area you're connecting from, or which Internet service provider you're using.)

What do you think of Google's move to put the privacy link on the homepage? How about YouTube's decision to comply with the law? Fire off in the comments!

Related Reading: If You Give Google a Cookie Google: A Clear & Present Danger to Corporate Data Privacy Google Privacy Practices Under Attack Google Defends Data-Retention Practices

Posted by Nathania Johnson at 11:09 AM | Permalink | Comments (2)

March 5, 2008

Google: The Spy Who Loved Me

Dr. Hal Varian, Google's chief economist and occasional Freakonomics Blog guest blogger, posted "Why data matters" on the official Google blog, cross-posted on the Google Public Policy Blog.

Varian explains that Web search algorithms are improved by the "wisdom of the crowds" drawn from the "logs of billions of previous search queries." That makes the general public - and government officials - nervous about privacy.

Varian tutors us in PageRank simplified and discusses link building in an ideal world - one where The New York Times and The Wall St. Journal, for example, would link to other sites generously:

"If I have six links pointing to me from sites such as the Wall Street Journal, New York Times, and the House of Representatives, that carries more weight than 20 links from my old college buddies who happen to have web pages."

The House of Representatives? Sounds more like Charlie Wilson's War.

SEOs, contact your local Congressional Representative for paid links - paid for with your hard-earned tax dollars.

The reality: when Dr. Varian was interviewed, The New York Times Freakonomics Blog linked to Google.org, Google green energy, Dr. Varian's position auction paper (pdf); BBC News on Moore's Law; Paul Seabright (Professor of Economics, University of Toulouse, France); Dr. Varian's NY Times energy article; another Freakonomics blog post; WebMD, Revolution Health, and Paul Anderson, Professor of Security Engineering, University of Cambridge.

That's the way major media outlets and journalists typically link: to each other; to corporate sites; to universities. It's an elite, exclusive club. Nick Carr's "digital elite."

That isn't to say Dr. Varian can't tell a good story. He reveals how Larry and Sergey trying to license their PageRank algorithm to "some of the newly formed web search engines."

No names named. None of the nascent search engines were interested. Since they couldn't sell their algorithm, Brin and Page decided to start a search engine themselves. (Note to VCs: Don't try this business model at home.)

Google has since added more than 200 additional "signals" to the algorithms that determine the relevance of websites to a user's query. We are the signals.

All the background info leads to one conclusion: Google needs your data. Google wants you to take a leap of faith. Google must store and analyze search logs. They want us to believe, "Nobody does it better."

Reminds me of Radiohead via Carly Simon:

"But like heaven above me, the spy who loved me/Is keeping all my secrets safe tonight. And nobody does it better/Sometimes I wish someone would/Nobody does it quite the way you do/Why'd you have to be so good."

Dr. Varian suggests readers "Watch our videos to see exactly what data we store in our logs."

Not everyone has time - or the inclination - to watch Google videos on YouTube.

What worries me: Google doesn't understand us any better than we understand the mathematical formulas of search engine algorithms.

Search Engine WarGames won't be fought between humans and machines.

Nick Carr put it best: "The erosion of the middle class may well accelerate, as the divide widens between a relatively small group of extraordinarily wealthy people - the digital elite - and a very large set of people who face eroding fortunes and a persistent struggle to make ends meet. In the YouTube economy, everyone is free to play, but only a few reap the rewards."

Posted by Kevin Heisler at 12:20 AM | Permalink

December 27, 2007

Google Misses the Mark with Reader Shared Items

This might make the folks at Facebook feel better about the whole Beacon privacy fiasco. It appears that even Google can make a mistake, as they did this month when they made shared items in Google Reader accessible to all Google Talk friends. Without asking. And without an easy way to opt out, short of deleting contacts or not sharing anything.

I don't know if I'd go so far as some, who claim that the move by Google ruined Christmas, but it was an unnecessarily foolish move by Google, which could have been avoided by making the sharing an opt-in decision, instead of an opt-out one.

This week (being a slow news week and all), many bloggers took offense to the move. Some complained that Google is invading their privacy by sharing items with people who they didn't intend to share with. Others blame users for not understanding what "shared" means.

Last night, the product team responded on the Google Reader blog with a response to the "helpful feedback" it received from bloggers. The sharing feature is still automatic and opt-out, but now users can quickly create a new tag for all shared items and then decide which contacts to share those items with.

And a link is presented at sign-in to a page that explains the process in the Reader Help Center:"If for any reason you'd like to start your sharing afresh, you can always remove all your previously shared items. Just go to the Friends Settings and click Move or Clear Shared Items. You will be given an option to select or create a tag and move your shared items to that tag, or clear your shared items. The items will remain in their original feeds along with any tags you've given them, but will no longer be in your shared items feed."

Posted by Kevin Newcomb at 5:28 AM | Permalink

September 28, 2007

Google Hack Gets At Personal Data

Philipp Lenssen has discovered a hack to Google's XSS that allows access to personal data, according to Blogoscoped today.

The tests he used with co-editor Tony Ruscoe show that is possible to get access to subject line information and first few words of emails from Gmail, statistical information from Google Analytics, as well as see what Google Gadgets are being used.

The glitch is specific to Explorer, the pair reported, and uses a cross site scripting attack.

The post comes with detailed pics of what is happening. Well worth the read.

Posted by Frank Watson at 1:18 PM | Permalink

August 15, 2007

SEW Experts: Google vs. the World

In today's Searching for Meaning column, "Google vs. the World ," Kevin Ryan is here to tell you that privacy is dead and your future lies in everyone else's hands.

Posted by Kevin Newcomb at 12:00 AM | Permalink

June 19, 2007

Google Sweet Google

Google Maps didn't photograph my cats, although my living room window is clearly visible in their shot of my building.

Rather, they immortalized me (as well as a neighbor) leaving our building.

The friend who called this to my attention notes, "Looks like it was taken in April...the facade on the new vitamin shop hasn't changed yet" (he strolled down the street to check).

It's a weird feeling, all right. But blurry enough so I don't feel violated or anything.

My friend added: "Now, of course, you need to Google your place of employ to see if they have you walking out of that."

I don't think I want to.

Posted by Rebecca Lieb at 12:39 PM | Permalink

June 12, 2007

Google Defends Data-Retention Practices

In response to an E.U. Article 29 Working Party investigation, Google has changed its data retention policies again. Instead of the 18-24 months that it announced in March as the cut-off for keeping server logs, Google will now anonymize its search server logs after 18 months, according to a post on the Google Blog by Peter Fleischer, Google's global privacy counsel.

The Working Party raised concerns over the length of time Google kept server data, as well as the length of time it set its cookies to expire. It also questioned the need to keep data or use cookies at all. Fleischer defends Google's policies, while making concessions with the length of time server data is kept and promising to reconsider Google's expiration dates on its cookies.

Danny Sullivan has a complete run-down of the convoluted saga at Search Engine Land.

Posted by Kevin Newcomb at 11:24 AM | Permalink

June 11, 2007

Google Privacy Practices Under Attack

Privacy International, a London-based privacy watchdog group, has issued a report citing Google's privacy practices as the worst among large online destinations.

None of the sites reviewed scored a "privacy friendly" ranking. Several were labeled "privacy aware" but needing improvements or generally aware but with "notable lapses." Sites like AOL, Facebook, Yahoo, and Microsoft Windows Live Spaces were labeled a "substantial threat." But Google was the worst offender of the bunch, according to the report, getting the only "hostile to privacy" label of the group.

The report didn't center on Google, but called out several players for their records on privacy:

While there may be a temptation to focus criticism on Google's privacy performance, it is important to note that not one of the ranked organizations achieved a "green" status. Overall, the privacy standard of the key Internet players is appalling, with some companies demonstrating either willful or a mindless disregard for the privacy rights of their customers. Even the better performing companies create lapses of privacy that are avoidable. With minimal effort most organizations can improve their privacy performance by at least one grade.

Privacy International spoke with AP, and followed the news coverage with an Open Letter to Google criticizing some of Google's responses to the media.

Yesterday, Danny Sullivan takes Privacy International to task at Search Engine Land in "Google Bad On Privacy? Maybe It's Privacy International's Report That Sucks." Sullivan criticizes the lack of firsthand information used in the report, and points to several examples where Google seems to have been judged more harshly than other companies in the study for similar track records.

Google engineer Matt Cutts weighs in today with "Why I disagree with Privacy International."

Posted by Kevin Newcomb at 12:20 PM | Permalink

March 15, 2007

Google to Anonymize Server Logs

In an effort to put users at ease and eliminate some privacy concerns, Google will begin anonymizing server log data after 18-24 months.

"By anonymizing our server logs after 18-24 months, we think we're striking the right balance between two goals: continuing to improve Google's services for you, while providing more transparency and certainty about our retention practices," writes Nicole Wong, Google's deputy general counsel.

The data Google cannot be used to track information back to an individual user. Danny Sullivan runs down the details at Search Engine Land.

Posted by Kevin Newcomb at 8:43 AM | Permalink

November 8, 2006

Eric Schmidt At Web 2.0 On YouTube & Other Issues

John Battelle spoke with Eric Schmidt at Web 2.0 yesterday. What have we got? YouTube's growth made it a necessary purchase. No, money's not set aside to cover YouTube legal claims. Yes, you can have your date if you want it, users. No, Google's not trying to take out Microsoft Office. Plus some more below.

Google CEO Eric Schmidt: We would never trap user data from ZDNet has coverage that has Schmidt saying:

  • Google bought YouTube because it was growing faster than Google Video, and video was a "fundamental data type" to Google.
  • Google's still figuring out ways to compensate content owners with video, a complex area.
  • Google would support exporting personal data (search history, email, etc) to other providers, if it can be authenticated.
  • Google's office products are "casual" and not aimed at Microsoft.

Google CEO denies rumor of YouTube legal reserve from Reuters quotes Schmidt as saying "not true" to a rumor that $500 million of the YouTube sales prices was set aside for legal claims.

@ Web 2.0: Day One Highlights: Ad 2.0; Google CEO; Skype Content from PaidContent covers Schmidt but also touches on IAC's Barry Diller saying in a separate interview that he doesn't expect Google will become a media monopoly or dominant player.

Web 2.0 Con: Liveblogging the "Conversation with Eric Schmidt" from Valleywag has a nice minute-by-minute rundown of the interview, for those that want more -- and covers that if Schmidt or one of the cofounders Larry Page or Sergey Brin don't agree on something, the cofounder wins. "I'm the one with the experience who's late. Left to their own devices they'd be early and right, but too early."

Posted by Danny Sullivan at 5:34 AM | Permalink

October 2, 2006

Reading Other People's Gmail Via Bloglines

Using Bloglines to snoop on people's private Gmail from Martin Belam looks at how he accidentally stumbled upon email feeds that individuals are posting to Bloglines. To be fair, it's an issue that could happen to any "private" feed that someone unknowingly shares to the public.

Gmail allows people to get a feed of their email, as covered in these help pages. That lets you see the subject of your emails along with short descriptions. But even this small amount of information might be too embarrassing for some people to have made public.

How would those summaries get made public at all? In the case Martin looks at, people are adding their Gmail feeds to Bloglines but leaving those feeds public for others to view. That's how he stumbled upon them.

Google does warn about this, but he thinks the warning could be more visible. Perhaps -- but it's also worth keeping in mind that using an online news reader means you need to carefully consider ANY feed you take and whether those settings are public or not.

Postscript From Bloglines:

Bloglines is committed to online privacy and we take our role in this effort seriously. I'd like to help correct some of the misconceptions and explain how Bloglines privacy works in regards to both search and feeds as well as how to use Bloglines properly to generate secure feeds.

The main issue at hand is the appearance of Gmail accounts in Bloglines and a users's ability to subscribe to these feeds (or search for posts from these feeds).

The examples displayed were actually Gmail accounts registered through a third party (Feedburner) and then subscribed to within Bloglines.

Bloglines actually provides HTTP authentication for secure feeds. When this method is used, Bloglines secures the feed so that it can not be searched on or subscribed to except by the owner of the feed.

However, when the user generates their feed through a third party like Feedburner, the authentication portion has been removed from Bloglines' control and we have no way to identify and secure the feed. As a result the feed and it's previously secure data become public. Clearly this is a problem and we are in contact with Feedburner and other third parties to help them better inform and protect their users.

The other issue is the definition and understanding of "private" feeds within Bloglines. Marking a feed as private in Bloglines only hides the feed from your public blogroll and your identity from the feed's list of subscribers. We try to make this clear to Bloglines users by prominently displaying the following note during the feed subscription process:

"Private subscriptions don't show up in blogrolls and you will not be listed as a public subscriber. However, the feed and all its posts will remain available to the public via Bloglines and Ask.com Blog & Feed Search. Exceptions are Bloglines email subscriptions and feeds that require http authentication. In both cases, the feed and its posts will not be included in search results."

This issue has reminded us that there is still some confusion about privacy in the world of feeds. We recognize that a better system of limiting access to feeds is needed as more content becomes syndicated or syndicatable. We have been leading the effort to build new safeguards into syndications standards and are hopeful that some type of Feed Access Standard will provide further security for users and their feeds.

Posted by Danny Sullivan at 8:36 AM | Permalink

June 22, 2006

Google Updates Toolbar Privacy Policy

It appears to me that Google updated the Google Toolbar Privacy Policy yesterday. I know the dates do not reflect that on the page, but if you take a look at the current version and compare it to the cached version from Jun 16, 2006 you will notice a lot of changes. Below are some of the larger changes to the privacy policy.

+ Removed a bullet that read;

We do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how some web sites embed personal information in web requests, click here.

+ Added/Changed Significantly the following bullets;

(1) Toolbar Features that give you access to other Google services such as Blogger and Gmail are subject to the separate Privacy Policies of those products. Features that require use of a Google Account, like Bookmarks, store information with your Account as explained in the main Google Privacy Policy. Other features, like SMS This, that let you transmit data from the Toolbar may log that data transmission, as explained in the FAQ. (2) Third party site custom buttons send information such as search queries to sites that are not operated by Google or covered by Google's Privacy Policy. (3) If you have Google Toolbar Version 4.0 or above, your copy of Google Toolbar includes a unique application number. When you install Google Toolbar, this number and a message indicating whether the installation succeeded are sent back to Google. Also, when Google Toolbar automatically checks to see if a new version is available, the current version number and the unique application number are sent to Google. The unique application number is required for Google Toolbar to work and cannot be disabled. (4) Except for information sent through Toolbar for use with a separate Account-based service such as Gmail, we do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how this may happen, click here.

Those are the changes I noticed.

Posted by Barry Schwartz at 9:03 AM | Permalink

March 8, 2006

Google Filings Against DOJ Request -- Including Declaration From Matt Cutts

I'm planning a deeper look at Google's rejection of the Department Of Justice search records request, which happened last week when I was on vacation. But a quick head's up. Many of you may have seen Google's blog post on the subject here, which in turn leads to their formal filing here (PDF). But that wasn't the only filing. Catching up on my feeds this morning, I saw that Gary compiled a full list of Google filings over here (PDF). My eyebrows shot-up when I saw Google's Matt Cutts had a long declaration as part of that package. I was planning to help spread the word more about this as part of an overall summary of what's in the various summaries, but Matt himself beat me to it with this blog post. So happy reading! I'll still be working on that general summary of everything hopefully for later this week.

NOTE: This was originally written on Feb. 22, but I've only just seen that it was left as a "draft" and never published. Sorry about that!

Posted by Danny Sullivan at 2:55 PM | Permalink

February 8, 2006

Google Introduces Marked Up Version Of Privacy Policy Changes

Google Brilliantly Updates Privacy Policy from Nathan at InsideGoogle notes that for Google Talk's privacy policy, you can now view a previous version where changes are highlighted. Nice. Other privacy policies at Google don't seem to have this yet. I'm guessing this will happen as each of them (such as toolbar or Gmail) are updated going forward. Most that I looked at were changed as part of a big privacy update Google did last October. Still, the Google personalized home page policy is dated as of January 2006, so it probably has changed since the October wave but has no guide to past versions. Prior versions of the general privacy policy can be found here.

Posted by Danny Sullivan at 12:34 PM | Permalink

January 24, 2006

Google Not Installing Third Party Cookies -- It's Firefox Prefetching

John Battelle spotted a post from Chris Marino at Tumbling Duke that has the worrisome suggestion that Google is allowing third parties to set cookies based on searches people do. But I dropped an IM to Dave Naylor, who immediately spotted this being due to Firefox prefetching.

If you use Firefox, Google will automatically preload the pages showing in the top search results. They made this change back in March. As they warned back then:

With prefetching enabled, you may end up with cookies and web pages in your web browser's cache from web sites that you did not click on since prefetching happens automatically when you view Google search results pages. You can delete these files by clearing your browser's cache and cookies.

So in Chris's case, he writes about how he searched for cars, Amazon and Walmart and got cookies from Cars.com, Amazon.com and Walmart. He assumed this is all related to AdWords in some way.

AdWords isn't the issue. It's because for a search on cars, Cars.com was the first site listed and so that page was preloaded -- and that meant a cookie from Cars.com came with it. It's the same situation was true for Amazon and Walmart. in searches on their names.

Posted by Danny Sullivan at 12:40 PM | Permalink

January 19, 2006

Court Documents & Summary Of United States Versus Google Over Search Data

Earlier we reported in Bush Administration Demands Search Data; Google Says No, Yahoo & MSN Said Yes that the US Government seeks to force Google to hand over search data. That story explains more about the situation, and there have been a number of postscripts from when it was first written. Along with that, we've been able to obtain copies of the three court documents filed in the case. Below you'll find links to each document, along with a summary of what's in each of them.

Alberto Gonzalez, as Attorney General of the United States vs. Google Notice of Motion to Compel Compliance (PDF File)

Two quick points. Remember, that this brief was filed by the Government and does not offer a response to their claims. I'm sure that will be coming. Second, I'm not an attorney and haven't played one on tv. My purpose was to summarize what was presented in the document.
  • The motions requests that Google comply with a subpoena filed by the Attorney General and "produce" for inspection and copying the materials the Government is asking for.  
  • After the lead government attorney conferred with Google, Google has chosen not to comply with subpoena.  
  • Google is asking the court to make Google comply  
  • The filing then goes into a background explanation about the Children's Online Protection Act (COPA) and how the government is developing its defense of the constitutionality of COPA. They believe that COPA is, "more effective than filtering software in protecting from harmful exposure to harmful material on the Internet."  
  • In preparation of the case, subpoenas were issued to Google and "other entities" that operate search engines to produce two sets materials.  
  • First, the subpoena asks Google to produce an electronic file contain, "[a]ll URL's that are available to be located on your companys' search engine as of July 31, 2005.  
  • However, after "lengthy negotiation" the government changed and "narrowed" their request and asked for a "multi- stage random sample of one million URLS from Google's database ie, a random selection of the various databases in which those URL's are stored, and a random sample of the URL's held in those selected databases.  
  • Second, Google was asked to "produce an electronic file containing [a]ll queries entered into the Google engine between July 1 and July 31 inclusive.  
  • Again, after lengthy negotiations the government the government changed their request and asked for an electronic file "containing the text of any search string entered into Google's search engine for a one week period (absent any personal information identifying the person who entered the query).  
  • Google has still refused to comply with these requests in any way.  
  • The Government says that access to this information would be of "significant significance" in the preoperation of the their case.
  • Specifically why?  
  • "The production set of queries entered into Google's search engine would assist the Government in its efforts to understand the behavior of current web users, to estimate how often web users encounter harmful-to-minors material in the course of their searches, and to measure the effectiveness of filtering in screening that material."  
  • This information would also help the Government understand what, "web sites people find through the use of search engines, to determine the character of those sites, to estimate the prevalence of harmful-to-minors material on those sites, and to measure the effectiveness of filtering software on that harmful to minors material.  
  • The document continues into a discussion with plenty of legalese and citations and again points out the Google has failed to comply and lists some of the reason Google objects to this.  
  • Google first objects to this on the grounds of relevancy.  
  • Google also objects on the grounds that if they would provide what the government asks for, they would be required to produce information identifying the users of its search engines.  
  • The Government claims that this is "illusory" since they have specifically asked for a random sample containing no personally identifying information to any search string.  
  • The Government said that it has received compliance from search entities with files containing no personally identifying information.  
  • Google also contends that the information they're being asked to produce is "redundant" since the Government has asked other engines to produce similar files. The Government argues that this "misunderstands" what's being requested. "The production set of queries from Google's database, in combination with similar productions from other search engine operators will assist the Government in developing a sample of the overall universe of search engines queries, while accounting for the potential of any variations in the type of queries that are entered into different search engines."  
  • The Government says that since Google is the market leader, its response, "would be of value" in developing the Governments overall sample of queries.  
  • Google says that complying would also force Google to share trade secrets because the total number of queries receives in a day is a trade secret. The Government adds that if this was the case, a district court has said that these numbers would not be disclosed.  
  • Finally, according to the filing, Google says that it will be subject to an "undue burden" in complying. The Government claims that this is not the case whatsoever. The Government adds that they would be "willing to work" with Google to specify a multistage sample. They are also willing to compensate Google for its work and complying with the subpoena.  
  • The filing ends with the Government saying that, "This court should require Google to comply with the subpoena on the same terms it's competitors have."

Declaration Of Joel McElvain (PDF File)

    The second filing is a declaration by Government attorney, Joel McElvain, who I believe the lead attorney for the U.S. Department of Justice in this matter. It also helps produce a timeline of events to this point. It includes:
  • A copy of the original subpoena, originally signed on August 25, 2005
  • Detailed info and definitions about Google was to submit to the Government.
  • A several page letter, dated October 25, 2005, from Ashok Ramani, Commercial Litigation Counsel, Google sent to Joel McElvain with his objection to the subpoena. THIS IS A MUST READ!!!
  • Key Quotes and Passages from the Letter

  • "It is against Google's competitive interest to be viewed as reflecting the whole world wide web."
  • Worth noting that Google says that the government tried to use Archive.org/Wayback Machine and found the results unsatisfactory. From the letter, "...given the www.archive.org's stated purpose, one would expect them -- with an appropriate consulting relationship to create the results the DEFENDANT wanted.
  • The Governments request is seen as redundant because they already has URLs from at least one other engine
  • From the letter, "Though the search engines doubtlessly have some differences in the URLS, they store, what distinguishes Google from it's competitors is the sophistication of Google's search engine in locating and ordering relevant results."
  • On the burden to Google. "Google would have to spend a disproportionate amount of engineering time and resources to (i) number (even in rough terms) in real time the URLs contained in its search database and (ii) extract based on that initial numbering the URLs selected by Professor Stark.
  • Google also objects because it could "endanger" its "crown-jewel trade secrets." Specficially, they would have to disclose the approximate number of URLs in its database and "some" details on how it crawls URLs, "such as the number of servers, server distribution, and how often Google crawls the World Wide Web."
  • More objections. "Google objects to the Defendant's view of Google's highly proprietary queries database as a free resource that Defendant can use, some levels removed, to formulate its own defense."
  • "Moreover, Google's acceeding to the Request would suggest that it is willing to reveal information about those who use its services. This is not a perception Google is willing to accept. And one can envision scenarios where queries alone could reveal identifying information about a specific Google user, which is another outcome we cannot accept.
  • Next, we find another letter. This time it's from DOJ's McElvain to Google's Ramani. This later is dated December 23, 2005.
  • The letter discusses how the Government is willing to narrow what's asked for in the subpeona
  • This is summarized in the Alberto Gonzalez, as Attorney General of the United States vs. Google section of this post.
  • McElvain discusses how Google asked for and was granted two extensions to serve their objections to the subpeona until October 10, 2005. He then writes, "In our several discussions prior to the service of those objections we had offered to limit the scope of of the requests for production, and you had indicated Google's willingness to consider compliance with the subpeona along with the narrowed terms that we had suggested. Your written objection also reiterated your hope to reach a resolution regarding Google's compliance with the subpeona. However, shortly after the service of your objections, you telephoned me to inform me that Google would decline to comply with the subpeona.
  • More conversations between the Government and Google take place on December 12th and December 21st to discuss the technical aspects of the request. Finally, on December 21st, MacElvain was informed that Google would not comply with the subpeona.
  • The final document is a protective order in the ACLU v. U.S. case.

Declaration Of Philip B Stark (PDF File)

This document is a declaration by Philipp Stark, Ph.D who was the person to work on the project. Dr. Stark is a Professor of Statistics at the University of California, Berkeley.
  • Stark explains how he has had conversations with the USDOJ, Google and other search providers, "to develop practical approaches to sampling their databases or URLs and search queries."
  • He adds that he has started to analyze the samples produced by search providers other than Google.
  • He writes, "Reviewing user queries to search engines will help us understand the search behavior of current web users, to estimate how often web users encounter HTM materials through searches, and to measure the effectiveness of filters in screening those materials.
Stark goes on to add more about his approach while including Google results are directly relevant.

Posted by Gary Price at 4:18 PM | Permalink

Bush Administration Demands Search Data; Google Says No; AOL, MSN & Yahoo Said Yes

NOTE: We're continuing to update this news through postscripts below the original story.

Via John Battelle and Google Morning Silicon Valley, the San Jose Mercury News article "Feds want Google search records" covers the Bush administration demanding last year that Google and other search engines turn over aggregate search information to help revive a child protection law. Google has refused to comply with the subpoena. A motion has been filed this week by US Department Of Justice to force Google to hand over the data.

In particular, the Bush administration wanted one million random web addresses and records of all Google searches for a one week period. The government apparently wants to estimate how much pornography shows up in the searches that children do.

Here's a thought. If you want to measure how much porn is showing up in searches, try searching for it yourself rather than issuing privacy alarm sounding subpoenas. It would certainly be more accurate.

Getting a list of all searches in one week definitely would let US federal government dig deep into the long tail of porn searches. But then again, the sheer amount of data would be overwhelming. Do you know every variation of a term someone might use, that you're going to dig out of the hundreds of millions of searches you'd get? Oh, and be sure you filter out all the automated queries coming in from rank checking tools, while you're add it. They won't skew the data at all, nope.

Moreover, since the data is divorced from user info, you have no idea what searches are being done by children or not. In the end, you've asked for a lot of data that's not really going to help you estimate anything at all.

Far better would be to do some searches that you think children and teens are actually doing, such as by doing a survey of them. Then just go start searching on Google and the other search engines yourselves. See what actually comes up, especially when the filtering protection each service offers is enabled. That would give you plenty of data, plus it would be useful for everyone to have someone rigorously test the filtering systems that are offered. Serving subpoenas to get the data isn't necessary.

It's important to note that from what I read, the requests do not involve user data at all. Shutting off your cookies or purging your personalized search data wouldn't protect you with this request, because the request wasn't going after personal data. To stress again:

  • According to the report, they wanted a list of one million web addresses. Not who went to the web pages and when, just a list of URLs picked randomly.  
  • They wanted searches for one week. I haven't seen the court documents, but I'm guessing Google could have handed over a list of searches that were entirely unassociated with IP addresses, times, cookies and registration information. Nothing suggests that they wanted to know who did the searches in any way.

Having said this, such a move absolutely should breed some paranoia. They didn't ask for data this time, but next time, they might. Of course, it bears reminding that this type of data is easily obtainable from ISPs. So even if the search engines refuse to comply, your own ISP could be giving up your data -- or selling it.

Overall, I say kudos to Google for declaring the request overreaching and refusing to comply. I'm checking with the other major search engines to see if they handed over data.

I've spoken and written a bit about the idea that the search engines need to consider creating a clear "Search Privacy Bill Of Rights," spelling out clearly what protections they'll pledge you'll always have with your data and exactly how it will be used, destroyed and so on. I want to move ahead with more explorations of this -- and perhaps we need a similar one enacted by governments to spell out what they will and will not do with our highly private search data.

Moving Past Google Privacy Fears & Toward An Industry Solution from me last year gives you a lot of background on search privacy issues from over the years. There's an extensive reading list at the bottom.

After I put that out, I also created a thread at our Search Engine Watch Forums, How Should Search Engines Protect Privacy?. Unfortunately, that thread -- while it got lots of discussion -- never generated as many concrete ideas and suggestions about what should go in a Search Privacy Bill Of Rights as I hoped for. So I'm trying again. Got thoughts, comments, suggestions? Please visit our new thread, A Search Privacy Bill Of Rights.

Meanwhile, want to talk about this particular move by the Bush Administration? I have a different thread for that, Bush Administration Demands Search Records.

Postscript 1: I have queries out to AOL, Ask Jeeves, MSN and Yahoo to find out if they provided data. I'll note answers here or in a new post.

Postscript 2: I said above that a more accurate way for the government to assess how often children might encounter porn through search engines would be to conduct their own research. Indeed, they have. Government Report Says MSN Search Adult Filter Most Effective from the SEW Blog back in June covers this report (PDF format) that the US Government Accountability Office did back in June. From what I can see, it measured how often children might encounter porn through image search. To do the assessment, no subpoenas were required. From what I posted in our active Bush Administration Demands Search Records discussion at the Search Engine Watch Forums on today's news:

FYI, back to the idea of child filters on search engines, the US government has tested this, as Government Report Says MSN Search Adult Filter Most Effective covers. Note that to do this, they said:

We performed unfiltered 5-minute searches for six keywords: three keywords known to be associated with pornography and three innocuous terms that juveniles would likely use (a popular teenage singer/actress, a popular cartoon, and a popular movie character).

They managed to do this assessment (the US Government Accounting Office) without issuing a subpoena to anyone. Moreover, it has stats they say they want already produced and ready to go. Page 48 and 67 have details. The caveat is that this seems to have been a test of image search results (Yahoo was 92 percent non porn, MSN 76 percent, Google 64%). But you could do the same thing to measure web search.

Postscript 3: Here's the official Google statement from Nicole Wong, associate general counsel with Google. It's what they already told the San Jose Mercury News and are telling other publications:

Google is not a party to this lawsuit and their demand for information overreaches. We had lengthy discussions with them to try to resolve this, but were not able to and we intend to resist their motion vigorously.

Postscript 4: MSN statement is below. It doesn't really answer the question, which was if they complied with a subpoena to hand over data similar to what Google's being sued over. Since it's not a denial, I'm reading this as a tentative yes, that they got a request and passed the data along. I've asked for clarification. The statement:

MSN works closely with law enforcement officials worldwide to assist them when requested. Microsoft fully complies with the Electronic Communications Privacy Act and United States Law as well as Microsoft's terms of use and privacy policies in working with law enforcement. It is our policy to respond to legal requests in a very responsive and timely manner in full compliance with applicable law. MSN takes the safety of its customers very seriously and is committed to providing a safe experience for consumers. As stated in MSN?s Terms of Use and Subscription Agreements, Microsoft will comply with applicable law to edit, refuse to post, or to remove any information or materials, in whole or in part, in Microsoft's sole discretion.

Postscript 5: It's important to note this case is not about stopping child porn. It's about trying to get a law passed that would help the government shut down sites that allow children themselves to access porn. To prove a need for the law, the US government wants to show how much porn children might encounter through searches. It's easy to confuse these two completely different things. I did originally, corrected the first draft of my story, but I still had a section stressing the child porn angle. I've remove that from the story above. Here's what I pulled out, for those who care about such edits:

Getting a list of all searches in one week definitely would let US federal government dig deep into the long tail of porn searches. But then again, the sheer amount of data would be overwhelming. Do you know every variation of a term someone might use, that you're going to dig out of the hundreds of millions of searches you'd get? Oh, and be sure you filter out all the automated queries coming in from rank checking tools, while you're add it. They won't skew the data at all, nope.

If you do, from talking with the head of a child porn fighting group in the UK, my understanding is that many euphemisms and code words are used that won't immediately register as child porn terms.

I can assume the Bush administration probably has investigators smart enough to know the euphemisms and other terms that those after child porn might seek. If you've got that list, just go start searching on Google and the other search engines yourselves. See what actually comes up, especially when the filtering protection each service offers is enabled. That would give you plenty of data, plus it would be useful for everyone to have someone rigorously test the filtering systems that are offered.

There are plenty of other ways to get samplings of non-porn searches that are done, to measure whether porn is showing up in response to these. Serving subpoenas to get the data isn't necessary.

Postscript 6: Ask Jeeves did not provide data, as they were not asked. Statement:

Ask Jeeves has not received requests for search data from the Department of Justice in this matter.

Postscript 7: Yahoo got a request, and I'm guessing compled. Guessing? The statement is below. At first, you'd think they didn't give any information. But that's not what it says. It says they gave no "personal information." That's easy enough, since as I noted above, the government didn't request any personal information. The aggregate data they wanted wasn't personal. Therefore, Yahoo may have handed that over. I'm following up. Statement from spokesperson Mary Osako:

We are rigorous defenders of our users' privacy. We did not provide any personal information in response to the Department of Justice's subpoena. In our opinion, this is not a privacy issue.

Postscript 8: New statement came in about a minute after I posted above, making it clear Yahoo did comply:

We are rigorous defenders of our users' privacy. We did not provide any personal information in response to the Department of Justice's subpoena. In our opinion, this is not a privacy issue. We complied on a limited basis and did not provide any personally identifiable information.

Want to comment or discuss? Visit our SEW Forums thread, Bush Administration Demands Search Records.

Postscript 9: In fairness to Yahoo, which handed over information -- and MSN which likely did the same -- it is important to note that it is not just spin that no privacy issues were involved with this particular data. As I explained in the story, the information is completely divorced from any personally identifiable data.

Let me especially stress this. Want 1 million random web sites? There's no privacy issue in that. The government didn't ask for the "bad" sites or sites that were linked with any particular activity. They just wanted a list of sites, probably so they could do a survey.

It's a stupid request, of course. It's sort of like the government asking a major car dealership to give you a list of random license plate numbers rather than the Department Of Motor Vehicles. Surely the government can generate its own list without forcing a private company to do this.

How about those search requests? They are a list of searches with no user data associated with them. If that's a user privacy issue, then live displays such as listed here are a long-standing one.

Here's a better example. Infospace -- which owns the Dogpile meta search engine -- has sold raw search data to Wordtracker for years. I have never heard of anyone concerned about the privacy implications in that. This is because there aren't any. You can't see who did a search, IP addresses, cookies, etc. It's just a big long list of words.

To hammer home the point, look at this:

That's the live (and warning, unfiltered) search display from Dogpile as I wrote this postscript. See anything linking any individuals to those searches? No, and that's all the US government would have gotten, a raw list of millions of searches.

So why the hoopla? Why not give in? Two reasons:

  • Competitive: Why give even raw search data out that possibly might fall into the hands of competitors. Even then, the lists from each major search engine will be pretty similar, so not that much of a worry.  
  • Trust: The data, as I've written, isn't going to help the government at all in what they say it will do. Heck, if they really need that list, they could buy the data from Wordtracker. But by handing it over, the search engine loses the perception of trust with its users. They may not understand that it is not personal. They will understand the government made a wideranging request for information and that the search company didn't push back. That type of trust is worth defending in the face of an ill advised, useless government action.

Postscript 10: MSN says they aren't providing more specifics beyond the statement they gave above. Since that statement does NOT deny that they provided information, I can only assume that they did. Unfair assumption? Well:

  • If they didn't get a request, as with Ask Jeeves, they'd say so (and probably breathe a sigh of relief that they didn't get one).  
  • If they did get a request and refused to comply, I'd expect we'd have seen a court case by now, as we are with Google.

That only leaves that they got a request, and that they replied. If I'm wrong, I'll happily post a correction and new statement, if MSN provides one.

Postscript 11: Seth Finkelstein sent me a link to his Free porn, Google, spam, Internet censorship, and the Supreme Court post, which highlights something Gary and I have written about for ages. You can't trust search engine counts to prove anything. While counts themselves haven't been shown to be an issue in this case, Seth's post shows that they might be something the Department Of Justice is considering. From the Boston Globe article he points at:

Ordinarily, US Solicitor General Theodore B. Olson prepares for an appearance before the Supreme Court by acting out his argument before a pretend court. This time, for a case about the Internet, he added a new twist: searching online for free porn.

At his home last weekend, Olson told the justices yesterday, he typed in those two words in a search engine, and found that "there were 6,230,000 sites available."

The top lawyer who represents the Bush administration before the Supreme Court said the search's results illustrate how pornography on websites "is increasing enormously every day," a central point in his argument for saving an antipornography law that was enacted six years ago but has yet to go into effect.

Hmm. Six million porn sites available? OK, let me do it now on Google. Now I get a figure of 26,900,000. How porn has grown. Ah, but how many pages (the count is for pages, not web sites) do we have in all? Google doesn't report a figure. But if I search for -kfdjkkdjdkfjdkjdk9d09d09d0jdkfdkjkf, a word that doesn't exist, I get a count of 9.7 billion pages. I know that the count is much higher than this (read this to understand more), but let swing with that figure:

26.5 million / 9.7 billion = 0.27% of the web equals free porn

You want to take that figure to court to show there's a lot of porn? Please. But that figure still doesn't mean anything. A search for online porn at Google only shows you pages that have those two words on them. They could be pages writing about the evils of online porn, how to avoid online porn, why online porn should be banned. Consider this:

That's a heck of a lot of pages with "no free  porn" on them!

Fox News & Danger Of Citing Search Counts over at our Search Engine Watch Forums is another example of the fallacy of citing search counts to prove points. For more deconstructing of the Olson proof, be sure to read Seth's send-up.

Postscript 12: Court documents we've obtained so far are now up. Gary's also working very hard to summarize what's in them. See them over at his Court Documents & Summary Of United States Versus Google Over Search Data post.

Postscript 13: AOL appears to have been asked and complied, at least according to the ACLU. I'm still waiting to hear back from AOL. Via Google Blogoscoped, Feds take porn fight to Google from News.com summarizes the court documents. The ACLU challenged the law the US government seeks to revive, the Child Online Protection Act. An ACLU attorney told News.com that Microsoft, Yahoo and AOL all chose to comply.

AOL disputes what the ACLU says -- but from what I read, that dispute is the same as Yahoo's original statement that they didn't give any personal information (Postscript 7 versus Postscript 8, above). Since the government didn't ask for any personal data, of course AOL didn't hand any over. But AOL says is did hand over search queries from a roughly one day period.

Postscript 14: Xeni Jardin over at Boing Boing has confirmation that AOL, MSN and Yahoo all received requests from the Department Of Justice along with Google. Google did not comply, hence the legal action.

Postscript 15: AOL sends a statement now saying they didn't comply, though it still looks like they did in part, as I explained in Postscript 13. To say they handed over no personal data is a non-issue. The Department Of Justice demanded no personal data. It did demand a list of search terms, and AOL appears to have given some amount of these to the DOJ. The statement:

We did not -- and would not -- comply with such a subpoena. We gave the DOJ a generic list of aggregate and anonymous search terms. This did not include search results, nor any personally-identifiable information, and therefore there were absolutely no privacy implications.

Postscript 16: MSN sends a statement today (Friday, Jan. 20) saying they complied with the subpoena:

Microsoft typically does not comment on specific government inquiries. That said, as you may have heard from the DOJ they did contact us in this case. We take the privacy of our customers very seriously. We did comply with the their request for data in this case in a way that ensured we also protected the privacy of our customers. We were able to share aggregated query data (not search results) that did not include any personally identifiable information.

Postscript 17: Xeni Jardin over at Boing Boing has AOL saying they did not comply with the subpoena. It's hair splitting time on which way to go on this. As I explained in Postscript 15, the argument that AOL gave no personal data is a non-issue. No personal data was requested. They did give a list of aggregate and anonymous search terms. That's exactly what the subpoena requested. The amount they gave is uncertain. Google was asked to give search queries for all of July 2005, which was later negotiated down to a request for a week's worth of data. AOL probably gave less than originally requested but still likely a big chunk of information. No mention of whether any URLs were handed over. I still see this as complying, but I'll follow up more with AOL about it.

Postscript 18: See also The Day After: Points In The Search Trust Sweepstakes from me. It reflects back on some of the bigger issue points raised from the situation.

Want to comment or discuss? Visit our SEW Forums thread, Bush Administration Demands Search Records.

Posted by Danny Sullivan at 6:03 AM | Permalink

November 16, 2005

Matt Cutts Has a Cup of Coffee at PubCon Conference

Like he does at many of the Search Engine Strategies conferences, RustyBrick (Barry Schwartz) is blogging from the WebmasterWorld PubCon currently underway in Las Vegas. This post: Coffee Talk with Senior Google Engineer: Matt Cutts, offers a great Q&A style review (not an official transcript) of today's hour long session. Kudos to Barry for making it available.

Here are two of the more interesting Q&A's from the audience:

Q: CSS positioning? How does it affect ranking. A: Good question, I don't know. If your doing an include, it probably wont matter either way. In his mind, positioning text at top or bottom, is over rated. But try it.

Q: Google Analytics, can you confirm that Google will be using that data in the search engine? A: He cant confirm, but he can deny it. :) Matt as a Web spam team member, does not have access to this data. He wont even ask for it. If it becomes a concern, he will post it on his blog. People will always be concerned, so don't use it.

Postscript: Aaron Wall was also at the Matt Cutts session and he shares his overview here.

Posted by Gary Price at 4:52 PM | Permalink

October 15, 2005

Google Updates Privacy Policy

If you're looking for some weekend reading, Google has just updated their privacy policy.

I used HTML Match to create a comparison of the two documents (dated October 14, 2005 and July 1, 2004) and posted a screen cap of the differences here.

Aaron Swartz's useful web-based HTML Diff web-based program also offers a comparison of the old and the new and is available here.

Of course, if you would like to review the actual documents:

A new privacy "highlights" document is also available along with a Google Privacy Policy FAQ. Policies for other Google services are linked in the left column of the FAQ. Most of these documents are dated October 14th.

Posted by Gary Price at 2:11 AM | Permalink

See More Posts From:

This Week | This Month

  var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA-564586-7"); pageTracker._setDomainName(".searchenginewatch.com"); pageTracker._trackPageview(); window.collarity_appid = "incmedia"; //> //>

Senior Digital Planner
U.S. International Media Los Angeles, United States

Senior Search Analyst
U.S. International Media Los Angeles, United States New York, United States

Webmaster - Marketing
West Virginia School of Osteopathic Medicine Lewisburg, United States

Web Marketing Manager
Harvard Business Publishing Watertown, United States


0