SES Chicago - December 7-11, 2009

September 15, 2009

Australian Newspaper Publishers: Search Engines Break Into Homes, Steal Content

Newspaper publishers are reaching such new lows with their arguments against search engines, I wonder how they stay in business at all. I mean, doesn't journalism require gathering facts and analyzing them?

The latest low is an Australian newspaper publisher who says that search engines indexing newspaper sites is essentially breaking and entering.

WRONG.

If anything, your newspaper is like a dance club, and you can deny entrance to the search engines if they're not dressed up enough for your taste. Just slap some no index code on your robots.txt file and it's like hiring the best bouncer in town.

Because, let's face it, you want eyeballs at your website. Otherwise, why have one? People find a ton of content through search, but if you're not liking the engines, just block them. Simple as pie.

Posted by Nathania Johnson at 2:44 PM | Permalink | Comments (1)

August 27, 2009

Italian Regulators' Investigation Against Google Essentially Proves Google Right

Italian regulators are investigating Google over complaints from newspaper publishers. The complaint is that Google bans newspapers from all of Google if a newspaper chooses to opt-out of Google News.

Google's Italian offices were searched by financial police, according to Bloomberg.

It seems that Italian newspaper publishers who are putting forth the complaint are shooting themselves in the foot (with apologies to Plaxico Burress).

On the one hand, they say it's a copyright violation to be found in Google News. On the other hand, they're complaining that they aren't found at all in Google.

So let me get this straight. A link in Google News is copyright violation. A link in Google web search results is not? I think you just proved the other side right, Italian newspaper publishers and regulators.

In this case, it's likely wise to "follow the money." Newspapers need advertising revenue to their websites, but if they're not getting as much traffic now that they've opted-out of Google News, advertising revenues have probably dropped. Instead of actually working on their business and recognizing the value of inclusion in Google News results, they simply want to make Google pay legally and likely financially as well.

Hopefully, regulators will wise up and realize the jig is up.

Posted by Nathania Johnson at 2:39 PM | Permalink | Comments (0)

November 27, 2006

Q&A On Google's Belgium News Agreements

This week, news emerged about an agreement between Google and two Belgian author groups that were suing it over copyright issues. Below, a short Q&A on what this means for Google. Highlights: The case goes on with three other groups taking part, but large damages seem unlikely. The new deal gives especially seems to give Google photo rights. Google says it is not doing an about-face on opt-out in Denmark. More about these an other issues covered below, based on a talk with Google spokesperson Jessica Powell. Plus, some bonus stats on how much traffic newspapers get from search engines.

Q. The case was originally filed against Google by Copiepresse. What are the other groups that joined and when did they come on?

A. In mid-October, Sofam, Scam, SAJ  and Assucopie all joined the case after Google posted the Belgian court ruling in late September.

Q. Who remains as part of the case?

A. Copiepresse, SAJ and Assucopie.

Q. Has Google paid any fines in the case so far?

A. Despite rumors, Google reiterated again today that it has not been asked to pay any fines.

Q. If Google loses the case, will it have to pay any damages?

A. Google says it hasn't been asked to pay any fines.

Q. What do the new agreements with the author groups Sofam and Scam allow?

A. Sofam represents Belgian photographers while SCAM covers mainly audio/video content. Exact uses are being worked out. As with the AP deal, Google highlighted this as providing new uses rather than a solution to the legal challenges over spidering and thumbnail image use. "It's a way for us to use their content in new ways beyond what copyright law currently allows us without the permission of the authors," said Powell said.

Q. Was there a financial aspect to the agreement?

A. Google's not commenting. Google is definitely paying the Associated Press to use some of its content, as the AP itself has reported. However, the exact terms, mechanisms or amounts have never been disclosed. Google wouldn't get into specifics on the financial details on the two Belgian deals other than to say these were deals that will allow the search engine to use the content in new ways.

Q. Is Google talking with the other parties to the suit?

A. Google said it won't comment on discussions but that it's always open to dialogue.

Q. Did Google reverse course and go opt-in for Google News Denmark?

A. Google says it chose to only launch in Sweden and Norway and that going forward it is not planning on an opt-in model in Denmark or elsewhere. The reason, says Powell, is that the company believes Google News complies with copyright law. "If publishers don't want their websites to appear in search engines, robots.txt enables them to automatically prevent their content from being indexed. And we even go beyond that: if a newspaper doesn't want to be a part of Google News, they only need to ask, and we remove them."

Between The Lines Time

The use of news images is one of the touchiest areas for Google to deal with, as I covered more in my Search Engines, Permissions & Moving Forward In Copyright Battles article.

The Sofam deal might help solve some of Google's legal issues in Belgium. The group represents the rights of nearly 4,000 photographers in Belgium, Google said. Google did NOT say how this might translate into usage at Google News. However, potentially this means Google can have photos in Google News even from publication that it had to remove from Google Belgium by court order. The Sofam deal might provide legal cover there. Of course, if those publications are the only source of certain photos -- and they block use through systems like robots.txt -- that would still keep the content out of Google. I'm also following up more on this particular issue.

The deals do not restore access for Google to list textual news stories it finds. That means it has to remain hopeful that the legal case will go its way, if it wants to prevent some type of negotiations with the publishers that have opted-out.

If the case goes against Google, it doesn't appear to be facing in major damages. If these were to be levied, that should have happened when it lost the first time. Instead, the publishers will remain out of Google, making Google News Belgium less useful than it would be. However, they also deny themselves traffic from Google. Possibly Google might negotiate a payment-based system to include them. Equally possible, it might also decide to hold its ground and focus attention on other countries, to see if it can wait the publishers out.

If the case goes for Google, then it regain content that will help enhance Google News Belgium, unless those publisher decide to specifically block spidering, which Google would almost certainly honor.

Overall, the action in Belgium -- as with Denmark -- underscore that in smaller markets, Google (and other search engines) may come under increasing pressure to negotiate deals to list material. The players are fewer and have more power concentrated among them. Whether these will be lucrative deals remains to be seen. In smaller markets, Google might decide it's simply not worth figuring out some type of financial arrangement -- especially for Google News which carries no ads, so generates no direct revenue. That might bring about more non-financial arrangements where the publishers cooperate for the benefit of getting traffic and also being dealt with personally by Google, rather than impersonally through automated permissions systems like robots.txt

Traffic To Search Engines

As an aside, I got a request from another reporter trying to understand how much traffic newspapers get from search engines. My response:

There's no specific answer to this. It will vary from paper to paper. Places like the New York Times will likely get a lot, because they specifically work to generate search traffic. Papers such as those suing Google in Belgium are getting probably nil, since they were removed by court order from Google.

In general, surveys have found sites getting anywhere from 8 to 13 percent of traffic from search engines. That might not sound like much, but often the first visit leads to repeat visits.

I also included two people on my response who I thought might have some better stats. Marshall Simmonds, chief search strategist for the New York Times Company, came back with this:

The one stat I can report is the NYT gets approximately 22% of its traffic from search engines. This number is very actively growing.

Bill Tancer, over at Hitwise, reported this:

Hitwise tracks 800,000 sites divided into 170 industry categories. One of those categories is our News & Media – Print category which covers Newspaper and Magazine websites (3,180 sites total). For the week ending 11/18/06 (based on our U.S. sample), Google was the #1 site sending traffic to the category at 13.66%, Search Engines as a whole were responsible for 22.44% of traffic for that same week.

That's a lot of traffic, however you slice it. There's no doubt things like Google News help build Google up as a company. But at the same time, Google News drives a ton of traffic to newspapers that are seeing the web as a new revenue source that might save them as print subscriptions dry up.

Posted by Danny Sullivan at 12:35 PM | Permalink

November 26, 2006

Google Settles With Some Belgian Publishers Over Belgium News Inclusion

Via Techmeme, news that Google has settled with two Belgian publishing groups involved in a lawsuit against it over content included in Google News Belgium. This comes a day after Google's legal case was reheard in an appeal. The settlement, following what seems a similar settlement with AP earlier this year, seems to open the door that Google is going to continue making such appeasements rather than fight cases in court.

Bloomberg reports that Google struck an agreement with Sofam -- which represents Belgian photographers -- and Scam, which represents Belgian journalists. The agreement allows for Google to use content from these groups (or from their members). Whether they are being paid for this, what content or how it will be used is not explained:

"We reached an agreement with Sofam and Scam that will help us make extensive use of their content," Jessica Powell, a spokeswoman for Google, said in a phone interview yesterday. She declined to give details of the agreement or say whether it involved paying the groups for the content, and declined to say whether Google, based in Mountain View, Calif., was considering similar accords with the newspapers.

In September, Google lost a copyright case filed against it by another Belgian publishing group, Copiepresse. Google later had to post the ruling against it on Google Belgium. However, Google was granted an appeal for the case to be reheard, as it hadn't been represented in court the first time. The stories below provide more background on all of this:

At some point, Sofam and Scam joined in the case. I see one reference to this back in October. Two other groups also apparently joined, since the Bloomberg report speaks to the settlement being with two of five total parties to the suit.

Those parties, led by Copiepresse, continue on in their action against Google. That action, as I've covered in my Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers article, is far more about trying to pressure Google into a financial arrangement to use Belgian news content than keeping that content out of Google itself. If it was just to keep content out of Google, the publishers could have easily done this through methods such as using robots.txt files.

Copiepresse seems confident of a legal victory:

Speaking on the phone from Brussels after the hearing, Margaret Boribon, the Copiepresse secretary-general, said she felt very happy with how things proceeded today. "I can't see how the judge could change his opinion,'' she said, certain that the court will uphold the September ruling.

Perhaps that legal victory will come, when the ruling is issued in late December or January, when expected. If so, it may not help Copiepresse in the real aim of a financial deal. Google may have enough content to make Google Belgium viable without the participation of the papers Copiepresse represents. They'd then be left in a situation of asking Google for reinclusion or going without the substantial traffic Google News can send web sites.

On the other hand, Google's settlement with the groups following on an agreement earlier this year with the Associated Press seems likely to fuel further publishing groups pushing for such arrangements, especially in smaller markets where key content is put out by a small set of publishers. Banding together and sticking with exclusion, they can severely hamper a news search service.

Norway Upset With Google News Over Copyright Laws covers how Google is being challenged in Norway. That hasn't developed into a legal case yet, but it's hard to see how Google's going to be able to say no to some type of agreement there. Pandia also covers how in Denmark, publisher opposition apparently created the unprecedented case of Google asking for permission to index news sites, rather than the normal case of spidering and requesting an opt-out.

Search Engines, Permissions & Moving Forward In Copyright Battles from me covers how in particular, Google's use of images for its news area is complicates issues and is making it harder for search engines in general to defend opt-out spidering, which I support. That article calls on Google to stop the inclusion of news images, as well as a pullback on showing cached pages and scanning of in copyright works without permission.

However, asking for permission to spider textual content for news search is likely to be as slippery a slope as cutting deals with publishers. It weakens the core legal position Google has argued over gather textual content from the web, most recently against suggested copyright changes in Australia that it said might make search engines unworkable.

As a reminder, Microsoft was also challenged in Belgium. Microsoft Removes Belgian Content Without Court Order covers this more and how Microsoft's reaction was to drop those publications. So far, it hasn't apparently cut a deal for reincluding them and perhaps may not feel a market need to do so.

Judge Gives AFP Case Against Google More Time covers how a copyright case against Google but Agence France Press over news inclusion is still ongoing.

I plan to follow up with Google Monday and see what further details I can gather on the case. I don't expect terms to be disclosed, but it would be good to know if a financial arrangement of some type was reached. That happened in the AP case, though Google was adamant the agreement there was not to allow it to solve a legal problem with spidering.

Many saw this as spin. There are other things the agreement would give Google aside from the right to spider, as my Google-AP Deal Not Pay-Per-Click & Some Further Details covers in more detail. However, it also conveniently solved the spidering issues for Google.

Postscript: See Q&A On Google's Belgium News Agreements for more on this story since it was written.

Posted by Danny Sullivan at 5:04 PM | Permalink

November 22, 2006

Baidu Wins Copyright Case Against Music Companies

Melanie Colburn writes that Music Labels Lose Copyright Suit Against Baidu, which started back when Five Music Companies Sue Baidu in September of 2005. Baidu was previously ordered to stop these music downloads but it appears the ruling was overturned because all Baidu is providing are links to 3rd party sites that facilitate the music downloads, whereas Baidu does not participate in the downloads themselves. More details at the BBC News.

Posted by Barry Schwartz at 9:37 AM | Permalink

November 12, 2006

Articles On Google's Copyright Conflicts From Me & New York Times

A Struggle Over Dominance and Definition is good New York Times article out today that looks at Google and whether it is a media company that conflicts with other media owners, especially in terms of using content from others without permission. It also sparked me to finally finish a long piece I've been meaning to do on Google, search engines and copyright issues. Search Engines, Permissions & Moving Forward In Copyright Battles is now up over at my personal blog Daggle, covering the important difference between indexing and reprinting, how robots.txt already provides a permissions system, why Google should stop scanning in-copyright books and also be a leader in dropping cached pages.

Posted by Danny Sullivan at 11:58 PM | Permalink

October 25, 2006

Microsoft Removes Belgian Content Without Court Order

The Register writes Microsoft dodges court in Belgian copyright battle where they say Microsoft decided not to go to court over Belgian newspapers request for them to remove their content from their index. Google was ordered to remove the content by a Belgian court and then later lost an appeal on the same case. Microsoft simply did not want to fight them and decided to just grant the wishes of the cease and desist letter sent to them.

Posted by Barry Schwartz at 9:08 AM | Permalink

October 17, 2006

MSN Belgium Drops Some Newspapers, Negotiating On Inclusion After Cease & Desist

Earlier, we touched on the fact that Copiepresse was threatening to go after MSN for carrying Belgian newspapers in the way it went after Google. Via PaidContent.org, Update: MSN is latest target of Belgian copyright complaint from InfoWorld covers how Copiepresse is now negotiating with MSN Belgium after sending a cease-and-desist letter to MSN. Copiepresse hopes to gain a share of advertising revenue.

Meanwhile, MSN Belgium has removed some newspapers. Removed from where isn't clear. MSN Belgium does have a dedicated news area, so it might be from there. However, sites may also have been removed from web search results similar to what Google did. I tried a search for site:lesoir.be, and the main news site seems to have been removed.

InfoWorld also notes:

The group, which represents some of Belgium's best known newspapers, including Le Soir and Le Libre, has been gathering more support for its cause. It was joined this week by separate groups that represent Belgian photographers, journalists, scientific authors and multimedia publishers, who plan to back its efforts.

It will be interesting to see how many more groups they rally in support against the search engines, and how the search engines react. I think there's a big difference between search engines deciding they might pay to include relatively small amounts of content in specialized news search engines versus a frankly insane idea that they're going to negotiate deals for inclusion in regular web search results.

Ultimately, the good people of Belgium might mind themselves without the ability to search the web, should Copiepresse succeed in its quest that getting permission via robots.txt should be illegal.

I've have much more to say on this subject -- I'm working on a piece I hope to post later this week. For some related material from me, see:

Posted by Danny Sullivan at 8:56 AM | Permalink

October 12, 2006

New Interview on Belgian Press vs. Google News (Microsoft Next?)

Sean Daly, from Groklaw, interviewed Margaret Boribon of Copiepresse on September 28th about their copyright lawsuit against Google, which targets the use of Belgian news in Google News, and cached copies of those articles. He has posted their discussion, in English and French, as well as some commentary and analysis of the litigation, including some late breaking news involving demands made by Copiepresse for MSN, and a potential new plaintiff.

I've written a brief synopsis of some of the points she raises in the interview at SEO by the Sea. Danny also talked with Margaret Boribon earlier in September.

Posted by Bill Slawski at 12:32 PM | Permalink

Ballmer: YouTube Overvalued & Google Transferring Wealth From Content Owners

The Web According to Ballmer from BusinessWeek has Microsoft CEO Steve Ballmer questioning the value of the Google-YouTube deal and oddly warning that Google is transferring wealth away from rights holders. It's an odd statement, since that's what Microsoft wants to do as well.

First the questioning of the YouTube value:

[You've got to ask] could Google do whatever it is they're hoping to buy without paying $1.6 billion? Is YouTube really some permanent, long-term thing, or is it a fashion?....Right now, there's no business model for YouTube that would justify $1.6 billion.

Though strangely, when BusinessWeek tries to pindown what seems a clear statement that Google overpaid, Ballmer says:

I'm not saying it is overvalued. I'm not trying to say that. It depends on a set of factors. I'm not saying I wouldn't write a check for that amount of money. I might.

And back to the controversial statement about Google's relations with content:

And what about the rights holders? At the end of the day, a lot of the content that's up there is owned by somebody else.

The truth is what Google is doing now is transferring the wealth out of the hands of rights holders into Google. So media companies around the world are all threatened by Google. Why? Because basically Google is telling you how much of your ad revenue you get to keep. They better get some competition. Us. Yahoo! (YHOO). Somebody better break through or you can short all media stocks right now. As long as there are two, you can hold onto media stocks. Google understands that. And that's one reason why they're willing to lose money up front.

Microsoft has its own video sharing service up, Soapbox. It has a question answering service, Q&A. It has an entire search engine that crawls the web like Google, Windows Live. Microsoft has plans for contextual placement of ads on pages, similar to AdSense. It's specific to MSN content now, but that will inevitably change. All of these things leverage the content of others in order to make money from Microsoft. So if these actions leverage wealth away from content owners, Microsoft is just as guilty of it as Google.

Frankly, all Ballmer seems to be saying is content owners would be better off if Microsoft was a strong third participant in ad game. Sure -- but let's not kid ourselves. Microsoft gets a lot better off by that as well, and it didn't jump into the game out of some desire to counter-balance the power of Google. It's in it to make as much money as it can, as well.

Posted by Danny Sullivan at 7:42 AM | Permalink

October 2, 2006

Copiepresse Upset Ruling On Google Wasn't Visible Enough

Last week, Google complied with a Belgian court order and posted the ruling against it in a copyright suit on the home page of Google Belgium and Google News Belgium, along with many other places including many search results pages. Now via Google Blogoscoped, news that the plaintiff in the case Copiepresse thinks the ruling should have gone at the top of the Google News Belgium page, rather than the bottom.

An article about the issue in Dutch is here. I don't speak Dutch, sadly, consigning me to AltaVista Babelfish, which translated a key part as:

That happened also, but on the start page of Google news, the topicality part of the site, stands the sentence entirely below. And that does not like Copiepresse.

Anyone hitting Google Belgium couldn't have failed to notice the beginning of the very long ruling, as the illustration above shows. But over at Google News Belgium, that ruling wouldn't have been seen unless you scrolled to the bottom of the page, past all the stories. That's what Copiepresse seems to be upset about.

The order did require that:

The defendant to publish, in a visible and clear manner and without any commentary from her part

Copiepresse might well be able to argue that on Google News Belgium, the ruling there wasn't clear and visible by being at the bottom of the page.

Of course, putting the long ruling at the top of the page would have been unworkable. The ruling itself didn't allow Google to put anything on the page directing people to see the notice at the bottom since that might have been deemed "commentary" about the ruling.

What next? If Copiepresse presses for more and wins, perhaps Google might have to run the ruling in a column alongside news content.

Frankly, Copiepresse comes across as petty in complaining here. Google already had a good argument that publishing the ruling was unnecessary given the wide press coverage the ruling had gained, though the court was not convinced and required the ruling to go up anyway. After that happened, coverage of Google's loss was only magnified. The point was made very publicly.

Posted by Danny Sullivan at 9:22 AM | Permalink

September 27, 2006

Google Talks On Its Approach To Content & Copyright

Our approach to content at the Official Google Blog has Google explaining to the world how it works with content owners and its desire to respect their rights.

In terms of copyright, Google stresses that it generally sticks to what's known as fair use, though the post doesn't use those words. The idea is that it shows very short summaries of stories, pages, thumbnails of images but doesn't reprint this material, requiring people to clickthrough to the actual material from places like Google News.

Of course, in the case of cached pages, many including myself would argue that Google goes beyond fair use. Cached pages are an example where content can be viewed without clicking through to the original site, and the opt-out approach for that doesn't feel appropriate at all.

Google also notes there are cases when it wants to go beyond fair use, to make broader use of content where permission would be required. The deal with the Associated Press is cited as one of several examples here.

To me, this is also a way for Google to help defuse the idea that some publications have, such as the Belgian newspapers recently, that Google can be bought off to avoid lawsuits. To me, this is Google stressing that it will do content deals in some cases, but that these content deals aren't necessarily being done to avoid lawsuits, especially when it feels it is acting within fair use guidelines. That's my speculation and take on this, of course. Google didn't comment when I asked if this was the reason for raising the AP deals.

Moving past Google saying it respects copyright, it then stresses that it allows people to opt-out, even if it feels it has fair use rights. In general, I agree with this method, which Google along with the other major search engines generally follow. Trying to get permission from each web site to index it would be an impossible task, and one that's not necessarily even legally required. Opt-out through things like robots.txt is an effective way to protect rights holders plus benefit the public as a whole. I do hope they'll change cached pages to opt-in, however.

Google talked with me about the post shortly before it went live yesterday, to see if I had any questions. The main thing in my mind was if this was in response to the Belgian lawsuit. No, I was told. The post has been in the works for some time, apparently. Google's hoping it will help people better understand their approach to content.

Posted by Danny Sullivan at 7:56 AM | Permalink

September 26, 2006

Some Google Belgium Follow-Ups

Just a quick note that Google's posted on its official blog about the Google Belgian news issue that I've been covering, while William Slawski has a nice translation in the works on the ruling itself.

About the Google News case in Belgium from the Official Google Blog doesn't really provide much new information that you haven't already gotten in reports from me and others. What should it provide? How about answers to:

  • Exactly how did Google fail to react to the legal action before it went to trial? Information was sent to Google's headquarters in Belgium. If it had been acted upon, Google might have won in the first round of the case by actually presenting a defense, rather than being absent.  
  • Why did Google initially refuse to post the ruling on the Google web sites in Belgium after last Friday's decision, then change its mind?

The post does stress that there are ways for publishers to easily stay out of Google. Those ways don't appear to have been presented to the court itself. Writes William Slawski in Belgian Copyright Ruling Against Google News:

I'm surprised by the lack of mentions of the use of a noarchive meta tag or noindex meta tags or by the use of robots.txt to disallow Google from indexing or archiving the pages of the newpapers in question.

While the Court does note that the onus of keeping copyright from being infringed falls upon the owner of the technology used to take text from the newspapers in question, this seems like an omission worth noting.

Regardless of how the Court may have felt about those options, I think that they should have been addressed in some manner. The failure to do so makes it appear that they either weren't provided information about those by their expert, or didn't understand them, or may not have addressed those issues on purpose.

A simple noarchive tag would have kept information on those pages from being cached by Google. A noindex tag or disallow directive should have kept their pages from being indexed at all by Google. Were they using these and Google ignored them? I suspect that they weren't.

After some more analysis, including an important argument over whether Google is a portal competing with newspapers or a search engine (answer, in my view, probably both depending on whether you keyword search Google News or read by browsing), he provides a long and what seems fairly complete English translation of the French-language ruling.

For more background on the case, see my prior posts:

Posted by Danny Sullivan at 9:03 AM | Permalink

September 25, 2006

Google Changes Mind, Posts Belgian Ruling

Google has now posted the text of a Belgian ruling finding it violated copyright on the Google Belgium home page. The ruling has also been posted to the home pages of Google Images Belgium, Google News Belgium but not Google Groups Belgium.

Last week, a court ruled Google had violated the copyright of several Belgium newspapers by listing them within Google News. The court ordered the removal of those papers from Google, which the company quickly complied with.

The court also ordered Google to post the ruling on its Belgian web site within 10 days or face a heavy fine. Google appealed that punishment, but it was upheld last Friday.

Despite losing its appeal, Google looked ready to defy the order to post the ruling and take the fines, until a second appeal could be heard in November. Now, the company has reversed course. The ruling went up on Saturday. The company gave no reason for the reversal to Reuters:

A spokesperson for Google declined to elaborate on the reasons that made the company change its mind but said it would seek to cancel the ruling.

"We are pleased that a judge has given Google the opportunity to appeal the substance of this case. This will be heard in November," the spokesperson said.

From Dow Jones newswire:

Google spokeswoman Rachel Whetstone told Dow Jones Newswires the company had agreed to publish the ruling on its Web site after studying the court judgment.

Technically, Google never failed to comply with the court ruling. It has 10 days from receipt of the ruling to act, and it has done so within that time, saving it exposure to fines. As noted, a second appeal on the ruling will happen in November.

Past coverage is below:

Also, I note that Microsoft's Windows Live is now operating illegally under Belgian law. For example, site:www.lesoir.be shows how pages from Le Soir -- one of the publications involved in the lawsuit against Google -- has pages listed in Windows Live, as well as cached pages. In fact, here's an example of an article from Le Soir about the Belgian ruling against Google that I can read at Windows Live through its cached copy. To date, no news that Microsoft is about to be sued.

Finally, over at Threadwatch, an interesting comment points out that Google might have been OK in Belgium if it didn't show cached copies of pages:

The truly critical essence of this Belgian court ruling concerns Google's caching functionality. Here, protected content is being displayed a) in modified form; b) more often than not in its entirety (i.e. not restricted to mere snippets); and c) without copyright holders' permission. In most countries this would be viewed as a flagrant violation of copyright law - and obviously this is the stance the Belgian court has adopted. (And yes, there's been a contrary ruling by a US court, but that specific case seems to be rather more complicated on closer view; also, there's some indication that it was decided on arguably faulty assumptions, but that's another story.)

It is interesting to note that the Belgian ruling specifically acknowledges Google's right to store third party content (no mean concession, that, and far from self-evident) for search purposes only. But displaying it in the cache for everyone to see constitutes an act of re-publication which, like it or not, demands copyright holders' express permission.

This is a very important point. Search engines make copies of pages in order to make content searchable, as my Indexing Versus Caching & How Google Print Doesn't Reprint article explains in more detail. It's very difficult to argue this type of copying harms a site owner, especially when opting out is so easy.

Showing these actual copies through cached pages has long been disturbing for many people. While it's easy to opt-out of such display, it feels a step beyond what a content owner should have to do. With cached pages, content is literally being reprinted rather than made searchable. It seems absurd for the content owners to opt-out in that instance.

Within the US, cached copies has so far been upheld, something I disagree with. But if Google were to eliminate them -- along with picture thumbnails -- it sounds like it might have a better chance of winning in Belgium.

Posted by Danny Sullivan at 5:51 AM | Permalink

September 20, 2006

Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers

I've had a long talk with the group that so far has successfully sued Google in Belgium over indexing, a talk that leaves me thinking they don't fully understand how search engines work and why their arguments over copyright infringement will ultimately fail. Then again, the case is really about trying to convince Google it should pay to carry their news content. A closer look at all this in the story below, as well as an update on the situation in general, including an appeal for Google that's been granted.

Let's go back to the beginning. In March, Copiepresse tells me it started legal proceedings against Google over its inclusion of Belgian news sources without explicit permission. The organization represents a number of publishers that were concerned over being indexed.

Information about the case, including a summons, was all set to Google in the United States, according to Copiepresse. A hearing was held in Belgium on September 5th, then the ruling came out last Friday, September 15. Google didn't take part in the hearings, for reasons it says it is still investigating.

The ruling required that Google do two main things within 10 days of receipt:

  1. Remove French and German-language content from the publishers from Google Belgium's web sites or pay a fine of €1 million per day  
  2. Publish the ruling on Google Belgium and Google News Belgium or pay a fine of €500,000 per day

Over this past weekend, Google says it complied with the first part. It removed links to at least these news sources, Google told me:

dhnet.be grenzecho.be lacapitale.be lalibre.be lameuse.be lanouvellegazette.be laprovince.be lecho.be lequotidiendenamur.be lesoir.be pressbanking.com votrejournal.be

It's been noted that Google did more than remove these sites from Google News Belgium. They were removed from Google Belgium entirely. Here are a couple of searches that demonstrate this:

site:dhnet.be site:grenzecho.be site:lacapitale.be site:lalibre.be site:lameuse.be site:lanouvellegazette.be site:laprovince.be site:lecho.be site:lequotidiendenamur.be site:lesoir.be site:pressbanking.com site:votrejournal.be

Some have thought this is an example of Google getting revenge, robbing these publishers of regular traffic they probably assumed was safe in a fight over Google News indexing. For its part, Google said its reading of the ruling meant that the sites had to be dropped entirely from Google Belgium:

Order the defendant to withdraw the articles, photographs and graphic representations of Belgian publishers of the French - and German-speaking daily press, represented by the plaintiff, from all their sites (Google News and "cache" Google or any other name within 10 days of the notification of the intervening order, under penalty of a daily fine of 1,000,000.- € per day of delay;

I've bolded the key part. Google says it interpreted "all their sites" as being all sites that it views the court having jurisdiction over, anything using the Google.be domain. In addition, Google has removed the sites from Google News worldwide, saying it is treating the ruling as it would any request to be removed from Google News. In those cases, you're dropped entirely, not on a country-by-country basis.

The sites do still appear in a searches via Google.com or other Google editions not aimed at Belgium. While these sites can still be reached from Belgium, Google considers them outside Belgian jurisdiction.

That view is sort of laughable, though I understand the reasoning well. It's unlikely that Google Belgium is actually being served up out of Belgium, so artificially pretending that Google.com another other Google sites are somehow "outside" Belgian jurisdiction makes no sense. However, this type of pretending isn't that unusual. It's a nice way for search engines to act like they are following the ruling of a particular country by making changes on "that country's Google." It's also a convenient way for particular courts to feel they've exerted jurisdiction over sites that that they might really not be able to control.

Overall, Google has complied with the first part of the ruling. As for the second, it hasn't posted the required notices and says it will wait for a ruling due out Friday specifically about that issue. It argued yesterday in a hearing for appeal that posting the notice on the home pages wasn't necessary given all the publicity the case has now received.

An appeal for the case overall was granted. It will be heard on November 24, and the entire matter is largely in limbo until then. I hesitate to consider the case a victory for Copiepresse given that the first hearing -- for whatever reason -- had no defense from Google at all.

This leads me to Copiepresse's complaint with Google. In the group's view, Google has illegally copied material without permission. It feels that in some way, Google should get permission before indexing.

Indexing, of course, is not copying. Search engines do read pages in to make them searchable, as my Indexing Versus Caching & How Google Print Doesn't Reprint article explains in more detail. But indexing isn't reprinting pages, in the way some arguments try to make it. Google does show cached copies, something raised in the case. But cached copies aren't shown within Google News search, which was the main focus of this case (as an aside, one US court has ruled cached copies aren't an infringement, something I disagree with but something also easily rectified through no caching mechanisms).

I had a very long conversation about the permissions issue with Margaret Boribon, secretary general of Copiepresse, to try and better understand how they wanted Google to operate. Why not use commonly understood and effective mechanisms such as robots.txt files or meta robots tags to prevent indexing?

"If you do so, you admit that Google does what they want, and if you don't agree, you have to contact them. This is not the legal framework of copyright," Boribon said.

This is an age old issue in the search engine world. By default, search engines assume that permission is granted to index a document, in order to make it searchable. Technically, shouldn't they get explicit permission? Legally, that might make things safer. Logistically, it would never work. Many sites don't have clear contact details. Some domains themselves contain multiple sites. Moreover, there are millions of sites across the web. Contacting them all beforehand simply wouldn't work well.

I asked Boribon about this, how her group would propose search engines undertake such a task.

"I'm sure they can find a very easy system to send an email or a document to alert the site and ask for permission or maybe a system of opt-in or opt-out," she said.

Would it be OK for such a system to work automatically, I asked? Yes, that would be fine. A machine-to-machine connection would be OK, she said. So then, I asked, why not use the existing robots.txt or meta robots systems?

Both mechanisms are easy, automatic ways for publishers to declare if they grant indexing permission or not. In fact, I'd argue that both are a way for search engines to ask beforehand for the very permission that Copiepresse wants them to seek. Major search engines -- not just Google -- all request or check these blocking mechanisms.

Boribon rejected the existing solutions. One issue she had was that they weren't legally endorsed. That's true, but that's also something I think will change over time. In the US, we've had one case recently where opt-out solutions like tags have been accepted.

Outside the US, there have been some scatted cases, such as this one from 1997 in the UK involving news indexing. But none of these cases have seemed to stop the search engines.

The Belgium case could be different. What happens in one country isn't applicable to others. It may be that Copiepresse will prove its point that permission should be sought in advance. Alternatively, a court could endorse existing blocking mechanisms as having legal force.

That's what I think should happen. These systems pose an easy way for anyone who doesn't want to be in a search engine to stay out. If the issue with Copiepresse was really about not being indexed, all of the publications it represents could easily stay out through those solutions. Google -- like other major search engines -- doesn't index sites against their wills.

There's more at work here, of course. The publications DO want to be in Google. The action is simply an effort to force Google to the bargaining table and get paid for inclusion, from what I can see.

"Our purpose is not to be excluded. Of course, we want to be in the system, but on a legal basis," said Boribon. "We want to be remunerated."

Her group's view -- as is the view of the World Association Of Newspapers that she also referenced several times -- is that Google is exploiting sites. It is making money off these sites and giving them little or nothing in return.

Most search marketers hearing this have to stifle laughter or disbelief. That's because most search marketers want all the search traffic they can get. It's free, easy and converts well. They understand that search engines give them plenty of value and complain most when something happens to take that traffic away, as was the case with the Google Florida Update of 2003.

I'm not going to spin out the argument that search engines generate far more benefits from the indexing they do than harm. For one thing, I think this is self-evident given the sheer amount of concern of getting into search engines, rather than out of them. If you must have more argument, see my past post, Search Engines As Leeches, The Difference Between Paid & Free Listings & Keyword Price Rises.

The difference between most publishers on the web and those of Boribon -- or book publishers also suing over Google's scanning program -- is that they think they are special, in my opinion. They think they have content that is more important than other content on the web, content that is either entitled to more protection or that warrants payment for being included.

Several times, Boribon stressed that those who spent a lot of time and money on their works deserved to be compensated by Google. My response was that I don't care if content is worth €1 or €1,000,000. It is entitled to the same protections. To be fair, Boribon agreed when I made that point. Yet our talk still continued to be riddled with her references to the high value of some content or the concept that only some content had protected status.

I've been through this before. Why Don't Book Publishers Object To Web Indexing? covers how one book group, while admitting that copyright law should apply the same regardless of whether works are in digital or book form, still suggested that online works were somehow different:

I think the issue is much more acute where the content is not made freely available by its copyright owner - which is, of course, the case for all the in-copyright content Google are planning to digitise from libraries.

Skipping past copyright law, let's focus on payment for inclusion. Boribon said that Google had made special arrangements with Le Monde to include it in Google News, explaining that was one of many examples of Google targeting the most important sources for special treatment.

My response was Google has special arrangements with lots of publishers that have content that can't easily be indexed. If Le Monde required user registrations, Google couldn't spider the site without contacting them and being allowed in. Indeed, it's the same thing Google has done for the New York Times, as we've covered. It's something Google (and other search engines) does for even non-news sites, if they have important content that it thinks should be gathered.

Google is not paying Le Monde or the New York Times for these arrangement, however -- something that Boribon seemed to believe the case, and no doubt other publications do as well. Google confirmed with me it has no payment system like this with Le Monde. But such a belief highlights the huge education challenge Google faces, trying to help these publications that have mistaken notions of how it -- and all search engines -- operate.

Of course, Google does have one paid relationship with a news source that came to attention recently, the Associated Press. Google still hasn't explained exactly whether this was a relationship it did to prevent an AP lawsuit over being in Google News or a separate agreement to pick up some of AP's content for reuse.

Fair to say, AP's content is important enough and helpful enough to Google that it did decide to enter into an agreement to make use of it in some way. Boribon's group feels their content is important enough that it should obtain some type of agreement as well.

This is also an old story, in some ways. Tom Mohr in Editor & Publisher earlier this month was only the latest of those with the newspaper industry sounding a call for newspapers to band together to deny content in hopes of getting paid:

But what if 2/3 or more of the U.S. newspaper industry sits on one platform, managed by Switzerland Inc.? What if Switzerland Inc. decides to deny Yahoo! and perhaps Google access to newspaper industry content for three months, followed by a negotiation for better terms? That's the power of a network.

The World Association Of Newspapers had a similar call earlier this year:

Web search engines, such as Google and Yahoo, collect headlines and photos for their users without compensating the publishers a cent, according to the World Association of Newspapers (WAN), which announced Tuesday that it intends to "challenge the exploitation of content" by the Googles and MSNs of the Web.

The Belgian lawsuit is simply another step forward in pushing for that payment, exactly what Google CEO Eric Schmidt described as "negotiation being done in a courtroom" when I spoke with him last month:

Because of our scale and because of the amounts of money that we have, Google has to be more careful with respect to launching products that may violate other people's notion of their rights. But also, frankly, we find ourselves in litigation and the litigation was expensive, and diverts the management team, etcetera, from our mission. In the cases that you describe, most of the litigation in my judgment was really a business negotiation being done in a courtroom. And I hate to say that, but that is my personal opinion. And in most cases a change in our policy or a financial change would in fact address many of the issues.

In the end, I want honesty. If the Copiepresse or the AFP (also suing Google) feel Google doesn't have permission to index their content, then just use the easily implemented mechanisms to get out and stay out. Don't file unnecessary court cases, nor just single out Google as the whipping boy when Yahoo and Microsoft, to name only two search engines, operate the same way.

Is it about getting paid? Is it that these publishers think they are so important they should get money for being included, since links alone to their web sites make search engines more comprehensive. That's fine, but you don't need a court case for that either. Just opt-out. If you're worth it, Google and the others will come running to the negotiating table. If you're not, well, no one's going to miss you -- but you'll miss the search engine traffic, as the Belgian publications almost certainly are discovering to their horror now.

I don't want lawsuits that seriously threaten web search itself. Bourbon's ruling potentially applies to all content, not just news content, in Belgium. Anyone could sue Google and other search engines saying that robots.txt blocking isn't explicit enough. If that happens, Boribon's organization is going to find searching the web from Belgium is difficult, since there won't be any content in Google, Yahoo or other services at all.

That would be ironic, given that Boribon says she's a regular Google user. She's routinely using a service where virtually none of the content listed is there because of some explicit approval process. That's hypocritical, given her group's lawsuit. If they don't believe opt-out mechanisms are sufficient, then none of these member publications should be using Google or any search engine as part of their daily routines.

Postscript: V7N points at WAN to combat 'search engine spiders', which has the World Association Of Newspapers suggesting incorrectly that search engines have no technological solution to spider only some content. They absolutely do. Content can be flagged on a page-by-page basis, if that's what a content owner wants to do.

Posted by Danny Sullivan at 3:23 PM | Permalink

August 2, 2006

Google-AP Deal Not Pay-Per-Click & Some Further Details

As it happens, I was at Google yesterday when the story came out about the financial agreement between Google and the Associated Press over the use of AP content. That story raised a number of questions, and here are some answers I can share so far from Google.

First, this is not a pay per click deal. Yesterday's Mercury News article talks about some agreements in general being this way:

It's a common perception, but it's false. Google and Yahoo, along with dozens of other Internet companies, have been quietly agreeing to deals that compensate some of the country's top news organizations for their content and help drive more traffic to their Web sites.

Recently completed deals, which include arrangements in which media organizations such as the Associated Press will be compensated on a pay-per-click basis, could herald a major shift in the relationship between the old media and new Internet gatekeepers.

The article doesn't say that the Google deal specifically is pay per click, but some people might wonder if that's the case. Google now clarifies that it is not.

Is this an agreement to keep Google from being sued by the AP, as it is by the AFP? Google wouldn't answer directly but said:

Google News is fully consistent with fair use and always has been.

Note that paidContent has reported how the AP only a few months ago said:

Let me say more clearly: we're not suing them.

So I tend to think it's safe to say this wasn't being driven out of legal fears.

What's the agreement cover? No more real details than you've already read before:

The license in this agreement provides for new uses of original AP content for features and products we will introduce in the future. We are very excited about the innovative new products we will build with full access to this content.

But note that this specifically talks about new uses -- not current uses. IE, I read this as Google saying again that what it has been doing to index AP content is not something it feels it needed an agreement to do.

Also this tidbit:

This is not the first time we've had a financial arrangement with a news organization.

Coincidentally, I'm at news search site Topix today, literally borrowing a conference room to do some email and blogging catch-up. I had a catch-up meeting with them earlier, and the issue of deals with the AP and newspapers in general came up.

Topix noted they signed an agreement with the AP earlier this year, which is part of an overall trend where they've seen news organizations eager to come up with new ways to work with news search sites.

Was this prompted by a legal fear? No. It was part of figuring out a way of dealing with syndicated news content that helps treat the AP's member publications fairly online.

AP stories can originate from one of thousands of member publications. Any of those thousands of member publications might also republish an AP story. Which story is the originating one? That's useful for a search engine to know, if you don't want your results to get overwhelmed by having duplicates of all the same content.

In terms of fairness, Topix uses the agreement to get a rich data feed of content from the AP (along with many other things). This helps them better understand if an AP story originated from a particular member publication and, if so, to link over to the publication that deserves the credit.

The agreement also allows Topix to put AP-originated national and international stories on its own site, rather than having to guess at which of many different news sites to point at.

For example, if the AP runs some international story that an AP reporter has written, how should Topix decide which newspaper to point at? Just pick some random newspaper that had nothing to do with creating it? And if so, what about registration or payment issues that might be in place at that random paper.

Hosting AP national and international stories helps solve this problem. Of course, hosting AP stories that come from the AP directly also means Topix -- and indirectly the AP -- can earn from ad revenue.

Understanding what Topix does with the AP shed sheds some light on possible Google motivations in working with the AP. Perhaps we'll see hosted stories as Topix is doing -- and as Yahoo also does -- for some of the reasons explained above. And perhaps the deal also is to give Google better news search capabilities as I've also outlined, something that's hard to do without a deeper relationship.

Postscript: Google, AP Disclose News Payment Deal from, ironically, the Associated Press suggests that a legal dispute was behind the deal. From the lead:

Google Inc. is paying The Associated Press for stories and photographs, settling a dispute with a major provider of the copyright news that the online search engine finds and displays on its popular Web site.

But further into the story, I don't see anything explicitly supporting that statement. There's this:

While AFP sued to protect its rights, the AP chose to negotiate terms with Google, which, after just seven years of existence, is nearly 10 times larger than the 160-year-old news cooperative in terms of revenue. The AP, a not-for-profit organization owned by U.S. news companies, had revenues of $654 million in 2005. Google, a publicly owned company, reported $6.1 billion in revenue last year and is on a pace to exceed $9 billion this year.

By agreeing to pay AP for content, Google falls in line with the owners of other popular news sites like Yahoo Inc., Microsoft Corp. and Time Warner Inc.'s AOL, which have been anteing up for years.

"We are happy to be dealing with Google as we are with all the major superpowers on the Internet," Seagrave [Jane Seagrave, the AP's vice president of new media markets] .said. "We are always looking for new ways to innovate."

But there's no one from the AP explicitly attributed in the story as saying that the AP was going to sue unless this agreement was reached. Still, I know the story author Michael Liedtke well, and I can't see him saying there was a dispute unless someone was saying that was what this about. I assume that would have been Jane Seagrave.

Posted by Danny Sullivan at 8:46 PM | Permalink

June 26, 2006

Follow-Up: School Couldn't Reach Google Until Injunction Filed

Catawba County Schools in North Carolina obtained an injunction to remove private material from Google because it had no luck getting action from the search engine after trying other routes, the district tells me. The school district also stressed that it didn't claim that Google had somehow hacked into its servers. Here's what Catawba County School's chief technology officer Judith Ray emailed me about the situation:

We asserted that Google had somehow bypassed our login information, not that they had hacked their way into the system. Hacking, to me assumes malicious intent and we never intended to imply that Google was doing anything other than spidering all the web sites available.

There is also miscommunication about "all users" being required to log in. The DocuShare server is a repository for both public and private information with logins being required for users who are authorized to view the restricted information. There are hundreds of pages of information that we share from DocuShare with users around the state. These are completely open and are not supposed to [be] password protected.

We did troubleshoot this situation by searching for the students' information at Yahoo, Dogpile, and AltaVista. We did not find any information on these three search engine returns and we attempted the searches over a three-day period.

We acted so aggressively with Google because, until the media got involved, we could not get beyond an operator at Google. We could not get operators to connect us with technical support, the legal department, or to anyone higher up in the organization. We were only given an email address to which we could submit a complain - which we did but got no response. Google has a link to submit an emergency request [see here] but on both Thursday and Friday of last week, the link took you to a dead page. Only when the news media submitted its own inquiry to Google did we get a call regarding the situation. And [Google] has been most helpful in working through this situation with us.

Of course, none of us who are employed with Catawba County Schools at the current time were involved when Xerox set up this server. We are trying to ascertain if the server was incorrectly setup/protected or if the appropriate include meta tags or strings were not included.

Google Blamed For Indexing Student Test Scores & Social Security Numbers from us earlier has more background on the injunction plus how I was finding pages from what the district said was a password protected area to still be available through Yahoo. As clarified above, some of these pages indeed didn't require a login to view.

Our story originally was headlined "Google Blamed For Hacking & Indexing Students Test Scores & Social Security Numbers" and said in one part, "the school [district] blames Google for some how breaking into a password protected area and indexing the content."

As stated above, the school district itself never appears to have said anything about being hacked, only that Google somehow got into information it believed was password protected, as it says on the home page of the district site:

We do not know how Google was able to access the secure, password-protected site. Once Google does access a site, it places a copy of the data on its own server. We immediately called and emailed Google, requesting the urgent removal of the link and site data. We have eliminated the link from our end and it appears that as of Friday night, June 23, 2006, Google eliminated the site from their end.

The hacking reference seems to come from the "Google 'hacked our website'" story at The Inquirer, which we linked to in our original story. While the headline says "hacked" in quotes, the story itself doesn't have anyone from the school district saying this.

Digg also has a School claimed google hacked it's private servers and then posted that data article. Again, the school district isn't alleging hacking, only that Google somehow got into information it believed was restricted. How that happened is still being investigated.

As for the reference to Xerox in the school district's explanation, in doing some investigating in our original piece, I noted that the server seemed to be managed by Xerox and shared by other companies as well, with material for those companies appearing to be hosted on the school district's domain. As noted, the school district doesn't know why this was happening, and it remains something they are looking at.

Finally, Google's had problems with the automated page removal tool before, though not that it was down but instead allowing people to remove pages from sites they didn't own. More on that in our 2004 story, Google Confirms Automated Page Removal Bug.

Posted by Danny Sullivan at 1:35 PM | Permalink

Google Blamed For Indexing Student Test Scores & Social Security Numbers

Google "hacked our website" from The Inquirer points to Blame game from the Hickory Record, a story about how the Catawba County Schools in North Carolina has gained a temporary injunction for "Google to remove any information pertaining to Catawba County Schools Board of Education from its server and index and alleges conversion and trespass against the corporation." The school blames Google for some how getting into a password protected area and indexing the content.

Let me make this clear, Google cannot submit forms or type in usernames and passwords. Someone at the school must of left an opening for Google. The security hole came from possibly someone publishing the content publicly, somehow, or by letting down the security or by posting a hyper-linked URL with an embedded password in the URL.

I agree, Google should remove this sensitive information, which they did on Friday after the judge issued the temporary injunction. But Google should not be blamed for this.

Postscript From Danny: As Barry notes, this isn't a case of Google deserving blame. It cannot guess at a protected server's usernames or passwords, nor is it configured to try and hack its way in. If this information got into Google, that's almost certainly because it was left unprotected somehow despite the school's "very secure site."

Since the school says all personal information has now been removed and is protected, I'll explain more at what I guess happened.

The story mentions that somehow, information from the site's supposedly protected DocuShare server got onto the web. OK, where is that server? The story doesn't say, but this search at over at Yahoo gives the likely location:

docushare catawba

Fifth down is this:

DocuShare Authorization Error Not Authorized. You are currently listed as Guest, which means you are not logged in. ... Password: Domain: DocuShare Catawba County. Copyright © 1996-2003 Xerox Corporation ... docucentre.catawba.k12.nc.us/docushare/dsweb/View/Collection-1546 - 6k - Cached - More from this site - Save

That shows you that Yahoo tried to access a protected page on the DocuShare server at docucentre.catawba.k12.nc.us. Is this the secure server that Google somehow managed to penetrate? Probably, given that this search shows nothing at Google now:

site:docucentre.catawba.k12.nc.us

That search comes up with no matches. That's probably because Google responded to the complaint last Friday to remove all pages from this domain. But since no one contacted Yahoo, there's a good chance pages from the domain still show over there. And in fact, that search at Yahoo currently shows 13,500 matches.

Are any of these the pages the ones with sensitive information? I did some searches that I felt should bring up whatever the page was that Google was finding and had no luck. This means:

  • Yahoo didn't have it, because it didn't crawl as deep
  • Yahoo didn't have it, because Google really did somehow manage to get pass a password barrier
  • Yahoo didn't have it, because I'm not guessing at the right words in the document

Yahoo clear has some information that the school district itself says:

This site was a DocuShare password-protected site that required all users to log-in

No, not all users had to log-in. If that was the case, you wouldn't see any cached documents at all, such as this one. Clearly, some content was accessible without being logged in -- which makes it possible that some content wasn't properly placed behind password protection.

Postscript 2: See our follow-up, Follow-Up: School Couldn't Reach Google Until Injunction Filed

Posted by Barry Schwartz at 8:51 AM | Permalink

April 10, 2006

Cairo Closes Doors

Just wanted to follow up to Danny's post last month about ShopLocal winning its crawling case against Cairo. Cairo has now closed the site and states at the bottom of the page, "for great deals and sales in your area please visit www.shoplocal.com."

Posted by Brian Smith at 1:17 PM | Permalink

March 22, 2006

ShopLocal Wins In Crawling Case Against Cairo

Apparently, ShopLocal.com wasn't happy with rival local shopping search service Cairo for hitting its listings. ShopLocal took Cairo to court, and Cairo has now agreed not to make further "robotic or other automated" visits to ShopLocal's site, ShopLocal says.

You'll find some background on Cairo from us here back in 2004. The following year, the Wall Street Journal found Cairo to be better than ShopLocal. If that was because Cairo had everything ShopLocal had from crawling the site and more from crawling others, then it's no surprise that ShopLocal decided to block access.

For more on the case, InternetCases.com has a few more details along with the case numbers.

Below is the statement ShopLocal sent with the news:

ShopLocal announced today that it has resolved the lawsuit it filed against Cairo, Inc. late last year in the United States District Court for the Northern District of Illinois. As part of the resolution of the litigation, Cairo agreed to an order of the court preventing it from making further robotic or other automated access to ShopLocal's computers or websites, and Cairo has acknowledged ShopLocal?s proprietary rights in its content. The other terms of the settlement agreement are confidential. The current lawsuit follows an earlier California federal court case in which a court found that Cairo was bound by the terms of use on ShopLocal?s website making Illinois the appropriate venue for the litigation. ShopLocal?s chief executive officer Brian Hand said ?We are very pleased we were able to reach this resolution as the protection of our proprietary content is of utmost importance to our company.?

Posted by Danny Sullivan at 5:52 AM | Permalink

October 21, 2005

More On Why Craiglist Said No To Oodle

Craigslist targets Oodle for 'scraping' its listings at the San Jose Mercury News looks at why Craiglist asked Oodle to stop scraping its listings, which we wrote about earlier.

Craiglist said some in its own community seemed to resent the listings profiting something outside the community. Of course, since Oodle was actually sending traffic to Craiglist, Craiglist itself was profiting a bit off of Oodle -- in the same way it profits with attention from traffic any search engine sends it.

Craiglist also said Oodle's crawling was putting a resource intensive burden on it and made use of the Craiglist name in marketing and press releases. Other sites might also be on the Craigslist hitlist, it's also hinted at, though Craiglist didn't name any.

It's also interesting to hear that in contrast to Craiglist, eBay's actually paying Oodle to carry its listings. That's also ironic given that eBay fought a suit to keep Bidder's Edge from carrying its listings back in 2000, helping cause Bidder's Edge closure the next year.

See also Why craigslist booted Oodle... and more to come? from the author of the Merc story, Matt Marshall, on the related SiliconBeat blog, on comments about scapers/meta searchers having trouble (or being unaware) of the Craiglist 100 listings maximum terms of use.

Postscript: Growing Pains in MetaVertical Search from Pamela Parker at ClickZ has further comments from Craigslist, including that regular search traffic referrals aren't that significant (only 1 to 5 percent). The story also revisits some other interesting meta search cases and disputes, such as AA versus FareChase.

Posted by Danny Sullivan at 10:41 AM | Permalink

October 20, 2005

Avoiding Legal Gotchas with Search Engines

It's still very early in the game when it comes to search engines and legal issues. Although a number of lawsuits have helped clarify things like appropriate content in meta tags and whether using trademarks is fair game, lots of other issues are still unclear and up in the air.

It's important to understand these issues if you're a search marketer, both to stay out of trouble and to know what recourse you have if someone poaches your intellectual content. A panel of legal experts discussed these issues on a recent Search Engine Strategies panel, and guest writer Grant Crowell caught the session, reporting on it in today's SearchDay article, Copyrights, Trademarks and Search Engines.

A longer version of this story for Search Engine Watch members goes into more detail about various methods to protect your intellectual content, including how to safeguard images from being copied, and a checklist for taking action if you've found that your content has been stolen and illicitly used elsewhere on the web.

Posted by Chris Sherman at 6:07 AM | Permalink

October 14, 2005

Craigslist Says No To Oodle's Scraping

Via John Battelle, news that classified ads search engine Oodle has been asked by Craigslist to stop scraping their listings. Oodle says most listings, 80 percent, don't come from Craigslist. Still, Craiglist has great data, and it's a blow for those to be gone. Looking over at Google, I see Craiglist seems to have no problem of Google indexing 12 million of its pages. Of course, there's a difference between indexing a page and scraping the content to be included in a more vertical service. Meanwhile, HousingMaps, a blend of Google Maps and Craigslist seems still OK. John's seeking reaction from Craiglist, so keep an eye out over there.

Posted by Danny Sullivan at 11:57 AM | Permalink

May 25, 2005

Forget Google Print Copyright Infringement; Search Engines Already Infringe

Gary blogged earlier about the Association of American University Presses having concerns that Google Print's digital library program may be equal to widescale copyright infringement. But that complaint, if ultimately upheld in a court case, would go far beyond print digitization. It might impact the fact that search engines already do widespread copying of content to provide the core search services we take for granted.

Let me zero in on a key part of the complaint:

Google's claim that it is fair use to make copies of every copyrighted work in even one major library, let alone three of them, is completely unprecedented in scale; it is tantamount to saying that Google can make copies of every copyrighted work ever published, period.

It is not unprecedented at all. It is exactly what search engines have been doing over the past ten years, since they started crawling the web. They are making copies of copyrighted works all the time, billions and billions of them.

When a search engine indexes a web page, it makes a copy of that page. Furthermore, all publications (at least in the US) are protected by copyright, regardless of whether that copyright is formally registered. Registration just provides further legal protection and redress in case of infringement. The fact that a work isn't formally registered doesn't mean it's a free-for-all for anyone to use.

When search engines index content, they do not formally request permission to do such copying. They just do it. Don't want to be copied? Then you have to stick up a robots.txt file or use the meta robots tag to opt-out.

If you don't opt-out, is that tantamount to granting permission? We don't know. The Bidder's Edge case didn't really answer it. Rather than copyright being the issue, it was found to be one of trespass.

The case involving image indexing between Les Kelly and Arribasoft cuts closer to this. When I spoke with Kelly about his case years ago, he didn't feel he should be required to opt-out, though he did try to. A court later found that there were fair use elements involved with showing thumbnails of these images.

The association's letter highlights this case in its argument against what Google is doing:

The single case you have cited to support Google's fair use claim, Kelly v Arriba Soft, has a pattern of facts substantially different from those in Google Print for Libraries. Among many other important differences, Arriba Soft was making copies of images that had already been digitized and posted on the web by their copyright owners. Google is presuming the authority to digitize many works whose copyright owners have not taken that step, and given the ease with which digital files can be duplicated and further transmitted, may have good reason for deciding not to do so.

Additionally, the full resolution copies Arriba Soft made in order to create the low-resolution thumbnails were deleted from Arriba Soft's server after the thumbnails were made. Google claims the right to retain the digital copies it makes -- the full resolution copies, if you will -- even in those cases when a publisher asks them not to display any text from particular works.

It's a bad argument. They are suggesting that the act of publishing on the web, which by its nature requires digitization, somehow may imply that copyright issues are somehow less valid.

They aren't. If it's a copyright violation to copy a print book, in order to index it and show summaries of what's contained, then it is going to be a copyright violation to index a web page, index it and show copies of what's shown.

In fact, Google, Yahoo and MSN go even further than this by providing cached copies of pages, another possible copyright violation explored in this News.com article from 2003. All do provide an opt-out of caching, of course. But again, it requires the author to explicitly take away permission, rather than the search engine first asking for it.

When I've written on such issues in the past, my own view as been that ultimately, a court will likely rule the value of web search combined with opting-out does fall on the fair use side. In other words, they aren't going to require that permission be sought before indexing happens. You don't want to be in? It's easy to opt-out.

The Google Print project could change that, however. Should publishers win a ruling that opt-out is not allowed, online publishers might insist that they are entitled to the same rights.

Want to comment? Visit our forum thread, SEW should support the AAUP's position on Google.

Postscript: Scholarly journals' premier status diluted by Web from the Wall St. Journal looks at how scholarly journals are under threat by demands they should be open to everyone.

Posted by Danny Sullivan at 9:37 AM | Permalink

March 21, 2005

Full Text Court Filings: Agence France Press v. Google

Last Friday, news broke of Agence France Press filing suit against Google in U.S.District Court alleging copyright infringement. Here's the SEW Blog post with links to a Reuters and AFP's own story about the lawsuit.

If you're interested in reading the actual court filings (to this point), here's the full text of AFP's complaint (filed 3/17/2005) along with the 5 exhibits referenced in the document. All documents are PDF files.

Main Document (Complaint) 19 pages Exhibit A1 12 pages Exhibit A2 8 pages Exhibit A3 9 pages Exhibit 4 10 pages Exhibit B 6 pages

Posted by Gary Price at 5:54 PM | Permalink

March 19, 2005

Agence France Presse Sues Google over News Content

Yes, it's another lawsuit that the Google's lawyers will need to handle. This one was filed by Agence France Press (AFP) (a global news agency that supplies material to many news sites) in U.S. District Court on Thursday.

AFP is suing Google for "at least $17.5 million" and "an order barring Google News from displaying AFP photographs, news headlines or story leads..." A Reuters article also says that AFP has asked Google to "cease and desist" from using its content but "Google has ignored such requests and as of the filing date of the lawsuit 'continues in an unabated manner to violate AFP's copyrights.'"

More in the articles: + Agence France Presse sues Google over news site from Reuters + Here's how AFP is covering the story via their approved feed from Yahoo News.

Posted by Gary Price at 11:26 AM | Permalink

March 15, 2005

Google AutoLink, Meet Butler, Which Enhances Google Results

Upset about Google AutoLink, the new Google Toolbar feature that adds links to web pages that it feels are appropriate? You might try a new tool created by Mark Pilgrim that inserts links on Google's own pages (NOTE: Updated below with comments from Mark Pilgrim). Via Boing Boing, news of his new Butler Firefox extension that among other things:

  • Removes ads from Google pages.
  • Inserts links that let you do web searches on competing search engines directly from Google's results.
  • Get news results from news sources beyond Google.
  • Get similar links to "alternative" sources for image and shopping.
  • Removes image copying restrictions from Google Print.

For example, in a search for cars, Butler inserts this at the top of the Google search results:

★ Try your search on Yahoo, Ask Jeeves, AlltheWeb, Teoma, MSN, Lycos, Technorati, Feedster, Daypop, Bloglines

And below news results listed, it says:

★ Find more news at Yahoo News, Ask Jeeves, AllTheWeb, MSN, Lycos, Technorati, Feedster, Daypop, Bloglines

To use it, you need to have the Greasemonkey Firefox extension. Once that's installed, you can then go back and install the Butler extension. Once activated, it can be disabled without actually having to uninstall it, should you want to play with the tool from time to time.

The usefulness of the tool is clear. It's very handy for the searcher to have. Given this, it would be hard for Google to object to the tool especially after Google's statement in my Google Toolbar's AutoLink & The Need For Opt-Out article on how they'd react to tools that added links or perhaps stripped ads from their search results:

"I think we'd need to look overall at the utility offered to the users. Can a good argument be made that those users understand what's going on?" said Marissa Mayer, Google's director of consumer web products at Google. "It would be hard for us to argue against user utility because those are the same metrics we're going to use in evaluating our feature set."

In that article, I wrote my view that when trying to balance desire of users and rights of publishers, tools that added links to pages went too far if they didn't provide a publisher opt-out. And that's main main issue with Butler. While it's giving Google a taste of its own medicine, by rights, it should be letting publishers also opt-out of having links added. And that means Google as a publisher should get that right to opt-out of Butler.

Will an opt-out be added? Would that be added if Google did the same for AutoLink? Pilgrim actually responds that his creation wasn't made as a way of pushing back at AutoLink. He emailed me:

I couldn't care less about the AutoLink hoopla, except that it gave me the idea for Butler. I think anything running on my computer should be under my complete control. I say this as someone who publishes content for money (although it's not my primary income).

Look, I run ZoneAlarm Pro with highest sensitivity and all advanced options enabled (including popup blocking). I run Proxomitron on top of that, and AdBlock and FlashBlock on top of that. These tools don't block ads by accident; they come pre-configured with specific knowledge of specific ad servers. Butler is just another ad blocker.

As for the "try your search on" feature, I am old enough to remember that Google used to offer this feature themselves. Back then it was "try your search on Altavista, Hotbot, Lycos, Excite, etc." All the popular search engines of the day. The point is, linking to competitors makes Google more valuable, not less. They seem to have changed their attitude about that as they've added more and more services of their own.

Google as a whole is becoming more and more of a walled garden, which is ironic, given that they started out in the business of sending people away. Now they take every opportunity to keep you within their walls. This might sound like a good idea in a Powerpoint slide deck, but it will kill them in the long run.

None of this answers your question about why I wrote it. Honestly, I wanted to teach myself Javascript and DOM scripting. I'm a geek, not an activist. I spend a lot of time using Google's services, and with the AutoLink faux-crisis still brewing, it seemed like an obvious choice of project.

As for a Google comment on the new tool, I've got a question in to them. In the meantime, some related reading:

  • Ok, Ok, I lied [I fired my butler]: From Jonas Luster's blog, this post against Google AutoLink follows the metaphor of AutoLink as a butler, but one that isn't necessarily acting in the interest of his employer. So Luster fires his butler, Google AutoLink.  
  • A New Butler For Jonas: While Pilgrim isn't positioning Butler as a slam against Google AutoLink, his colleague Sam Ruby does make that connection that this is an example of an open source push-back against Google's tool, one that anyone can potentially modify and change. From my view, the fact that it is open source doesn't make it any more acceptable to me as a publisher. I still want an opt-out. Be sure to read the comments below the original post for some interesting discussion.  
  • Want a line? Here's a line: From Yoz Grahame is referenced in the above Ruby post, and Grahame comments on that. At issue is his attempt to draw a line about when content-modification is acceptable. He argues that Google AutoLink is in the spirit of his definitions of being acceptable because users understand it, it isn't automatic, it can be limited by the user and it's in the spirit of the web.

My own view is that trying to come up with some type of universal guidelines for content modification tools isn't going to be successful. I think there's going to be a variety of lines that we draw over time, and those lines might even change over time. But for me, right now, adding links is a clear and simple line we can start with. If you make a tool that adds links to a page, you should give the publisher an ability to override that feature.

How could opt-out be done? SearchGuild -- which published the first widely-cited AutoLink killer -- is pushing a meta tag. No tool uses this tag right now, but they could. I'll expect to add the tags to Search Engine Watch soon just to show my support. More about the tag here: JavaScript to Kill Google Autolink.

All-in-all, Butler is just the latest example of the "mess" AutoLink created when it was released, as I wrote earlier. It came out, then we got an AutoLink killing script, a supposed way to kill that script, now a tool some will use to fight back at Google plus heaps of bad PR for Google continuing.

Two years ago, the company pulled the related searches feature that its own AdSense publishers hated within 48 hours. We don't need months more of testing AutoLink for Google to realize it needs to make some significant changes to please publishers and not just the usual noises of always considering feedback. Let's get on with an actual solution, starting with an immediate opt-out.

Posted by Danny Sullivan at 9:37 AM | Permalink

March 10, 2005

WSJ's Mossberg Against AutoLink

I've got an update coming on developments with Google AutoLink since I last wrote about it (see Google Toolbar's AutoLink & The Need For Opt-Out). There's a petition, meta tags against it and so on. But in the meantime, via Steve Rubel, news that Wall St. Journal tech columnist Walt Mossberg has come out against the system: Google Toolbar Inserts Links in Others' Sites, And That's a Bad Idea. Mossberg was instrumental in getting Microsoft's Smart Tags killed, as my earlier article explains more.

Posted by Danny Sullivan at 10:03 AM | Permalink

February 25, 2005

Google Toolbar's AutoLink & The Need For Opt-Out

AutoLink is new feature in the new third version of Google's popular Google Toolbar that's raised controversy since it was released last week. Why are publishers upset? Can they block the feature that adds links to their web pages? Who rules over content, users or publishers? Why do I think Google should give publishers an opt-out for the feature. That, and other issues, we'll explore in this article. It's a long one, so the links below will let you jump to particular sections, if you prefer.

How AutoLink Works

Let's start by revisiting how the feature works. It's only available to those using the Google Toolbar 3 beta. Existing Google Toolbar users have not automatically had this feature added, so the number of people currently AutoLink-enabled is small. It will grow, of course, when the toolbar comes out of beta and takes over as the main one offered to the public, something likely to happen in the next few weeks.

Currently, AutoLink only reacts if it spots four types of information on a page:

  • Package Tracking Numbers (those currently supported in Search By Number for regular search results)
  • US Vehicle Identification Numbers (VINs)
  • US Addresses
  • Publication ISBN numbers

Below, I've inserted two examples in the article so that anyone with the AutoLink-enabled toolbar can see autolinking for themselves easily. The first is the book Web Search Garage by Tara Calishain with its ISBN number shown. The second is Google's address:

Web Search Garage Prentice-Hall, August 2004 ISBN 0131471481, $19.99

Google Headquarters 1600 Amphitheatre Parkway Mountain View, CA 94043

If you have the AutoLink-version of the Google Toolbar installed and come to a page like this one with such "trigger" content on it, you'd hear a little "popping" sound familiar to anyone who uses the Google Toolbar currently, when it blocks a pop-up window from opening.

The AutoLink button in the toolbar also lights up or goes active, changing from "Not Active" to "Active" as shown in the illustration below:

When active, you can push directly on the button or use the little drop-down arrow next to it to get a menu, as shown with the "Drop Down Box" example.

Whether you push directly on the button or use the drop-down option, in both cases, links are also added to the page, making them look like this:

Web Search Garage Prentice-Hall, August 2004 ISBN 0131471481, $19.99

Google Headquarters 1600 Amphitheatre Parkway Mountain View, CA 94043

Click on the ISBN link, and you'll be routed via Google over to a page about the book at Amazon. Click on the address, and you'll be routed to that address shown in Google Maps.

Alternatively, use the drop-down box, select an option shown, and an entirely new window will open to display the AutoLink content. In contrast, with the links on the page, new windows aren't opened. Instead, the original window is replaced with the new content.

Don't like the links? Via the drop down box, you can use the Remove option to get rid of them or put them back using the Add option, if they have been removed.

By the way, earlier this week I found that using the drop-down box did NOT add links to the page. In fact, because I was using the drop-down box rather than pushing on the button, I at first didn't think links were actually added to the page at all. I talked with one other person who had the same thing happen to her. But in writing this article, that behavior changed for me.

Google says it's made no alteration to the toolbar behavior since it launched. Nothing has been changed on their end, the company says, and I should have always been seeing links added to a page whether I pushed directly on the button or chose the drop-down option. Given this -- and how corroded my IE installation has become over the past year or so (one reason I now use Firefox), I'll chalk it up to an oddity on my end.

The User Benefit

Google says feedback from users so far is that they like the feature. That's easy to see why. If you come across a page about a book without a link, as I showed above, it's very nice that you can get to another page with more information about it or the ability to buy it. Amazon fills that role nicely. I've often come across books mentioned on pages, then had to do the copy-and-paste routine over at Amazon in the way AutoLink helps make unnecessary.

Similarly, if you see an address such as on a corporate web site and would like to get a map, this is a handy way not to have to cut-and-paste into a mapping program.

The Publisher Benefit & Fears

Fair to say, feedback so far from publishers isn't so rosy. Yes, some think the feature is nice, such as prominent blogger Anil Dash has said. But from my review, he's in the minority. We've had other prominent bloggers such as Steve Rubel, Dan Gillmor and Dave Winer crying foul.

Closer to home for me, many search marketers who are also publishers clearly dislike the tool. At our Search Engine Watch Forums, the AutoLink & Google As Anti-Webmaster thread isn't finding many people in favor of it. The same is true for the New Google Toolbar Feature Rekindles the Old SmartTag Debate thread at WebmasterWorld.

Publishers do get a benefit from the tool. If they've failed to add useful links, those visiting their sites perhaps may come away happier that they were still able to leverage the information on the pages to get further information.

The publisher fear is far larger. Many publishers consciously decide what links they want to add. Having some tool come along and modify their content is simply unacceptable to them. That's especially so given how easy it would be for any tool to grow capabilities, such as making words into ad links that generate no revenue for them -- something that's happened in the past.

We've Been Here Before

There is a ton of hue and cry about how Google is trying to repeat a plan Microsoft abandoned after large outcry in 2001 called Smart Tags, which would have allowed words on pages to be turned into links. Which links and to where? That would have been determined by Microsoft.

By the way, a key developer of Smart Tags from Microsoft does now work for Google. However, rumors that he was involved with Google AutoLink aren't true. Google says he's involved in a completely different product.

Microsoft backed off from Smart Tags, but TopText from eZula went ahead later that year. It inserted yellow hyperlinks into pages -- paid links that earned eZula money but not the publisher. My Forget Smart Tags; TopText Is Doing What You Feared article from back then looks in depth at the system and the concern that arose over it. I'd strongly encourage reading it, because there are plenty of direct comparisons between what happened then and what's happening now.

eZula's still out there and apparently offering the same type of placement, but my impression is that the system didn't gain greater popularity due to search marketers who especially rallied around the late Jim Wilson's Scumware site to fight the program.

Why did search marketers care so much? They were footing the bill. Ads they placed with people like LookSmart got inserted into pages that they never actively chose. Many disliked this and made threads to their ad providers like LookSmart to stop partnering or lose them as customers.

Predating both the Smart Tags idea and TopText was Amazon's zBubbles and Flyswat, both from 1999. They came and went without any major outcry. Flyswat in particular inserted links on pages just as TopText did, Smart Tags would have and AutoLink now does.

I see now that some places like Symantec now class Flyswat as spyware, which sort of amazes me given that I thought the product long ago had died. I can't even reach the Flyswat site, but I suspect old installation copies are still floating around via download sites such as PC World (which offers it here, then offers an anti-spyware tool to get rid of it here). But at the time it was out there, Flyswat drew praise in many quarters as a great browser "helper."

Monopoly & Monetary Fears

Why was Flyswat largely acceptable, when only two years later, Smart Tags and TopText drew ire and today, Google AutoLink faces criticism?

With TopText, the answer is easy. Publishers didn't like the fact the system let competitors manage to insert themselves into their own content. Others who had purchased precisely targeted search ads weren't happy to discover that these ads were then in turn distributed to TopText for less precise contextual targeting.

With Smart Tags, it was the monopoly factor. Microsoft had such a dominant share of the browser market that letting it control how words would be linked was simply too frightening to many -- and this despite opt-outs the company decided just before the end that it would offer.

Enter Google. It, too, occupies a dominant role. We don't know exactly how many toolbar installations it has, but the company acknowledges millions of users. To be fair, Marissa Mayer, Google's director of consumer web products, told me that queries generated through the Google Toolbar are "by no means a majority of all Internet Explorer users" who access Google.

"With AutoLink versus Smart Tags, the toolbar is different is that its only installed by users [as opposed to automatically being part of the browser] and is by no means a majority," she explained further.

Even Microsoft blogvangelist Robert Scoble agrees here, arguing that Google can do things Microsoft can't because Microsoft still has a browser on 9 out of 10 desktops out there. Nevertheless, he was against Smart Tags and doesn't seem to favor the current Google implementation of AutoLink.

Monopoly or not, the toolbar clearly has many users. In addition, people like Winer fear that if Google is able to offer this type of feature, nothing prevents Microsoft and others from doing the same.

So with Google, there's a bit of the monopoly factor. I think there's also the TopText-like fear that AutoLinks could cost publishers money. If you have a page about a book, you might not want Google sending someone to Amazon to purchase it, especially without your own affiliate code.

As an aside, it's worth mentioning that there are other reasons why you might find advertising links inserted into editorial copy. Vibrant Media's been doing this for some time through its IntelliTXT service. However, the issue of publisher rights as with Google AutoLinks is not in question with this type of service. That's because the publisher themselves has chosen to add the links.

Instead, the issues are more about the practice from an editorial integrity standpoint, and yesterday's Ads Embedded in Online News Raise Questions article from the New York Times is just one of many articles to look at this.

Back to Google AutoLink, a remaining major concern for publishers is simply that they might not want Google sending anyone anywhere out of their sites via links that they didn't provide in the first place. There's a potential traffic loss people worry about, though Google doesn't see this as a serious problem.

"Are we really taking traffic away from them? Think about what they've [users] have done. They've been looking at the page. They've decided there's a piece of information on the page. They had to get the idea that they wanted to get more information some way. They clicked a toolbar button, and then they clicked a link. That's a pretty determined series of user actions. It seems to me that that user is going elsewhere anyway," Mayer said.

Future Development

What about the idea that Google might put ads links on pages? That's not something it does now, nor does the company have any plans to in the immediate future, it said.

As for those Amazon links, Google said it gains nothing from them. Amazon was selected because it was seen as the best choice for book information.

"Obviously Amazon is a partner of ours, but there was no monetary exchanges as part of this development. We picked out what we thought was the best user experience for things we linked to," Mayer said.

Don't like that choice? When the tool emerges from beta in the near future, it is definitely planned for people to choose some of the content providers they want to tap into. If you want links to Barnes & Noble for ISBNs rather than Amazon, you'll almost certainly be able to do that or pick from others.

How about the tool expanding the range of what's auto-linked. That could happen. Google's not saying what may or may not change, because the tool is still in beta -- a traditional style beta that should only last a few months at most.

It's possible, Google said, that if users push the button, it might decide that the toolbar should always automatically show links rather than make this a page-by-page choice users initiate. Or not, depending on feedback.

New features could also be added or removed. The company is interested in link enabling anything that someone might have to cut-and-paste to get existing information from Google. For instance, enter a stock symbol into Google right now, and it links to you stock data. Potentially, stock symbols could be turned into AutoLinks.

Couldn't any word be made into a link? Sure, but that would be too much, Google says.

"That goes a little too far. We aren't interested in turning an entire page into hyperlinks. That's not particularly helpful to the user," Mayer said.

What's Acceptable & What's Not?

AutoLink also raises anew the philosophical debate of who ultimately controls content. "It's my content, hands off!," is a common theme that resonates with many publishers. What gives Google the right to start tampering with your page?

Google's response is that the users give them the right. The users want this tool. The users want to control how they view that content.

"It's important to recognize that the toolbar is installed by people who want Google-enhanced functionality," Mayer said. "I would argue that the user is adding the link to the page. Google just provides the tool."

That's a pretty forceful argument. We don't hear many objections to the fact that users can control font sizes as they like, for example. Google's open source program manager Chris DiBona goes through a litany of more things like this in his personal blog post on the issue, Oh, please.

It's easy to add more. I've heard plenty of praise for various Firefox browser plug-ins that can do special things to pages when they spot certain types of links or the ability to restyle entire pages with Firefox. Why is Firefox so praised for enabling users but Google suddenly seen as evil for doing the same?

Indeed, this isn't the first time Google has interacted with publisher content via its toolbar before. The ability to highlight or jump to words on a page are widely praised. But more dramatic was the addition of a pop-up blocker in June 2003. That not only prevented some web sites from doing what they wanted to do, but it also arguably cost some publishers money through the blocking.

Wide-spread criticism? Hardly. I've seen a few grumblings from time-to-time that Google might be blocking commerce and publisher intent this way, but the praise over the pop-up blocking feature has been enormous -- and mimicked by other search toolbars. My guess is that publishers didn't fight back more against this because it was clear how hated pop-ups where by consumers.

Drawing The Line At Links

So where is that line when a tool gives a user too much control -- or better, when a user is given control that a publisher ought to be able to counter? I agree with many others that adding links crosses it. I don't care if the user thinks adding links to my pages will make things better for them. As a publisher, I want to be able to override a tool that tries this.

Legally, we don't know where publishers really stand on this, as the recent Google toolbar move raises online ire from News.com examines. But forget legal.

Instead, adding links is a line that I think any respectable software publisher shouldn't cross. Last year, Google introduced a set of software principles that are all about protecting the user experience. An addition to those principles should be made to protect the publisher experience, as well.

Provide An Opt-Out!

In this case, I think Google should provide an easy opt-out that publishers can implement to block AutoLink. Some others want AutoLink to be opt-in -- that Google shouldn't be able to do anything like this unless publishers explicitly say they should.

I think that's too far. Users do have rights. They have installed this software. Opt-out gives any publisher seriously concerned with the tool the ability to control it on their site. Many won't be concerned, so requiring an opt-in is overkill that does hurt the user experience.

It's also somewhat hypocritical to demand Google do an opt-in for this tool when virtually no one demands an opt-in about being crawled. Why that isn't demanded is pretty clear. People want in Google because of the traffic it will bring them. But being crawled is another form of messing with content.

For its part, Google doesn't want to do an opt-out. The fear is that it will hurt the user experience.

"If you had opt-in or opt-out, that's overall a lot less useful," Mayer said. "If the links sometimes won't show because there's a publisher opting-out, that's bad for the user experience."

Explaining further, she said:

"It's an interesting balance to strike, but we're going to weigh more heavily on the user side," Mayer said. "We think we struck the initial balance in a reasonable way. The publisher's page is seen as intended in the browser. It's a user-elected action that changes things. Beyond that, we aren't driving all traffic to Google."

Google also feels there's a form of an opt-out in that it won't overwrite any existing links. Worried that an ISBN code might get turned into a link by Google? Make it a link yourself, and it will be untouched.

Indeed, when Gary Price first wrote about the AutoLink feature in Search Engine Watch last week, he used an example of going to Barnes & Noble to show how unlinked ISBN codes there got auto-linked through the Google Toolbar to connect people to Amazon.

That made Barnes & Noble into a poster child for many publishers about why AutoLink was bad. Look at how it put links to a competitor on the Barnes & Noble site!

It took the company about a week, but an opt-out is effectively in place with Barnes & Noble. As I wrote yesterday, all ISBN numbers on the site now have links to Barnes & Noble's own content.

It was probably an easy move for them to make, having a database-driven site. But for others, it could involve a lot of hard-coding. In addition, if Google adds new content types for AutoLink, then publishers have to go back and make more changes. Adding your own links to block Google AutoLinks is simply not an effective form of opting-out for many to use.

They're My Users Too

My response to the "protect the user experience" argument is pretty blunt. Too bad if it is harmed in this case, from Google's perspective.

They may be Google's users, but they are also my users as a publisher as well. If my visitors are upset that my site prevents them from using Google AutoLink, they can tell and lobby me directly. I don't need Google deciding for me what my users want on my web site.

Google would gain on the public relations front from offering an opt-out. Even better, I'd encourage them to lobby for a single standard type of opt-out that other publishers could support such as through a robots.txt file extension that works for everyone. That would be real leadership in the industry and in line with the software principles statement it started last year.

Turning The Tables

How about turning the tables? How would Google feel about programs that modified its search results. It's not even theoretical. We have tools that will strip out ads from Google because the user may not want ads. We have software that will add links to Google's own results (for more, see our forum thread).

"I think we'd need to look overall at the utility offered to the users. Can a good argument be made that those users understand what's going on?" Mayer said. "It would be hard for us to argue against user utility because those are the same metrics we're going to use in evaluating our feature set."

It's a change from when Google was asked about this in 2001, on what it thought of TopText adding links to its results. At that time, it wasn't an issue of it being OK if it helped the user. Instead, the Google wasn't concerned because there didn't appear to be much take up of TopText.

Still, things change -- and it's helpful to have a current view on where Google stands, especially if a competitor like Yahoo or Microsoft decides to add a feature to its toolbar that allows users to hit links inserted on Google pages to generate results from their search engines.

The Toolbar Area Itself Is Yours

I'd sweeten the pot a bit to encourage Google to give an opt-out. Personally, I only want it to prevent adding links to my pages. Want to display links via the toolbar? That's fine -- it's your toolbar, do what you want with it.

Wouldn't that mean Google might down the line start showing ads or content related to my pages in the toolbar. Yes, it might. But we've had tools do this sort of thing already (a new toolbar program from Searchfeed and EffectiveBrand just came out this week), plus free useful tools do need to be supported somehow.

I wouldn't necessarily like it, but if it's not interfering with my actual page -- popping things over my content, adding links but instead staying within the toolbar area, I'd live with it.

That's especially so as long as the user clearly knew what was happening in the toolbar. All the same arguments Google makes about the user having the right to do what they want, I heard the same from TopText way back when. But Google says its history of user disclosure on what the toolbar does is better, and I largely agree.

"You can just look at Google's track record as with the PageRank feature. We tell people it's not the 'usual yada yada' and we are very up front," Mayer said. "We make sure our users are really informed that something going to happen, because we want to have the trust of our users."

In other words, no one gets tricked into downloading the Google Toolbar. And the links aren't automatically enabled. You do have to make the choice to turn them on.

Nevertheless, I still don't want links added to my pages. But if someone wants to consciously choose to click on a button that makes new windows pop-open, it's hard to object.

Similarly, we have a long history of other tools being tolerated for showing related content, such as Alexa. Heck, for ages both Internet Explorer and Netscape had built-in "related links" functionality powered by Alexa that few ever objected to.

Alt-Click Away!

Another option for Google is to provide Alt-Click functionality in the way that the GuruNet helper application (now Answers.com, also once called Atomica) has long allowed. In this case, people can select a word, hold the ALT key and click with their mouse, which in turn brings up a page with more information about what's described.

This doesn't add anything to a web page, easing concerns about content manipulation. Indeed, Wall St. Journal writer Walt Mossberg, who rallied against Smart Tags in 2001, nevertheless loved GuruNet for letting him Alt-Click on words in his same complaint against Smart Tags and has continued to praise the GuruNet's Alt-Click feature in 2003 and 2005.

In short, Alt-Click is an easy way to provide the user who wants to make a conscious choice to act upon ISBN numbers, addresses or other content that lacks links with AutoLink-like functionality -- just without having to use the actual links that are objectionable to some publishers.

Google did consider this option, but links were seen as more intuitive:

"We talked about whether we should make this work like that or something else. But we think that if you're going to create a link, the ability to get to get to another page, the web already has paradigm for that. Right now, the link really does make sense," Mayer said.

Adding further, she said:

"The links that we add do look different. We work hard to help the user understand that this was a link added by the Google Toolbar, that it wasn't a native link. We do this through a mouse rollover that is visible when you mouse over the link."

From my end, the mouse rollover isn't enough, little Google color "bubbles" or "balls" added to the hand icon, along with link pop-up text that says "Google Toolbar AutoLink." That's because before you hover, these links look identical to native links -- and some people are just going to click rather than hover for very long.

A different color or a double-underline or something would help. But while I certainly agree that links are far more intuitive, whether they look radically different from native links or not, they simply clash too much with publisher rights, in my view, and at this moment.

Here's An Opt-Out

You don't have to wait for Google to provide an opt-out, especially in that it might never do so. Threadwatch describes a JavaScript blocking solution cooked up by Search Guild. Download the solution (instructions are provided), insert it into your web pages. The same Threadwatch thread is also tracking any new solutions that come up -- some new server-side ones have just been posted.

Meanwhile, an anti-anti-AutoLink option appears to also be out there for users who want to override publishers trying to prevent AutoLink. I say appears because it seems like a clunky workaround that I can't really understand -- and looking at the comments posted, some others don't get it as well.

I mention it mainly because it highlights how quickly things have become absurd. You have third-parties working to prevent AutoLink and potentially others working to prevent preventing AutoLink. It's a mess.

The user experience is hardly being protected by Google refusing to provide an opt-out. It would be much better for Google to provide an opt-out in a way that makes publishers happy but also lets Google report clearly to its own users if the publisher has blocked AutoLink from the site they are visiting.

After all, it's arguably bad for the user experience if they can't get cached copies of pages. Nevertheless, Google has long allowed web site owners the ability to opt-out of having pages cached, primarily it seems to avoid conflicts over copyright. Despite this opt-out, the cached pages feature has survived for years. AutoLink can survive opt-out black spots, as well.

Finally, just weeks ago, Google acknowledged that publishers should have MORE ability to control their links through the introduction of the nofollow link attribute. It's disconcerting to say the least to then have the same company assume a right to add links to publisher pages without permission.

Posted by Danny Sullivan at 10:43 AM | Permalink

January 18, 2005

Microsoft Tells Google To Cease and Desist

It's always interesting to review the collection of Cease and Desist Notices sent to Google (and others) via the Chilling Effects Clearinghouse. Today, a few new C&D letters were made available including several from Microsoft that request Google remove several posts on Blogger weblogs that are hosted by Blogspot. In addition to browsing the list of letters, you can search for all of the notices sent to Google (451 as of today) by using this search engine, and search for the term "Google" in the "Recipient" field. Some notices deal with Google Groups (Usenet archives) and Google Images.

Posted by Gary Price at 2:12 PM | Permalink

November 20, 2004

Adult Magazine and Web Site Files Law Suit Against Google

Red Herring reports that Perfect 10, an adult magazine and Web Site has filed a lawsuit in a Los Angeles Court against Google. The suit alleges that the, "search engine giant provided Internet users with at least 800,000 unauthorized links to images of Perfect 10’s nude models, stealing membership fees and advertising revenue from the Los Angeles publisher. The lawsuit is one of the first of its kind against Google."

According to the suit, because Google profits from the misdeeds of others on the web, it is legally and financially responsible for the alleged violations.

Update: John Palfrey from the Berkman Center at Harvard Law offers an anlaysis. He has also made available a redacted version of the complaint (removing some "graphic" images).

MediaPost also provides more coverage: Adult Publisher Sues Google For Copyright Infringement

Posted by Gary Price at 8:31 PM | Permalink

November 12, 2004

More On Google Images & Google News Images

Earlier, I blogged about how the blend of Google Images and images from Google News caused some to wonder if Google was censoring pictures of US soldiers involved in the Abu Ghraib prison torture scandal: Stale & Split Image Databases Fuel Google Conspiracy Theories.

Yesterday, Nathan from InsideGoogle spotted what seemed to be a new change to Google Images. Any images coming from Google News are identified at the top of the results, with a "Images from Google News" heading. You can currently see this in a search on arafat.

That's a good way to help people understand why some pictures might move in to, then out of, Google Images. But it also raises new questions about Google's explanation over the missing photos.

The implementation isn't new. Comments in Nathan's blog explain that the system has been going since April. You can see a screenshot from back them via Google Blogoscoped: Google Images Merges With News.

Given that the system's been in place for months, it should have been pretty clear to the person viewing the scandal photos in Google Images that they came out of Google News. However, there's no indication that this was the case. Instead, it sounds like they were just Google Images listings.

I'm checking again with Google for any further explanation.

Posted by Danny Sullivan at 8:32 AM | Permalink

November 8, 2004

Stale & Split Image Databases Fuel Google Conspiracy Theories

A post at AnandTech raises concerns that Google Images fails to find pictures of US soldiers involved in the Abu Ghraib prison torture scandal. A conspiracy? In reality, just a failure of freshness on Google's part.

After the post was discussed at Slashdot, Google cofounder Sergey Brin sent word via Slashdot that Google was embarrassed to say its image database just hadn't been updated recently.

John Battelle summarizes more in his Google Image Search: Updated Only Twice a Year? post. While accepting Google's explanation, he wondered how the images that were there before could go missing. Others at Slashdot wondered as well. So did I.

Sure, information comes and goes on the web all the time. But for there to be no pictures at all for someone like Lynndie England, as this Google Image search shows? That seems odd, especially when the same search at Yahoo Image Search has plenty of examples (before forewarned, the link brings up several graphic and shocking pictures).

OK, Yahoo Image Search just recently got updated, as we covered two weeks ago: Yahoo Announces Size Increase to Image Database. But even if Google isn't as fresh, as it now readily admits, that still doesn't explain how the pictures that were once in there are now gone.

Answer? Google News. Google tells me that Google News has its own image search database that flows into Google Images. So when you search on Google Images, you're searching both the Google Images picture database (the stale one, currently about six months out of date) and the Google News image database.

Google won't say how long images stay in that Google News image database. I'd guess something like one to two weeks. That helps explain why a Google Images search for Lynndie England or Abu Ghraib might have brought up different results two weeks ago.

In October, Lynndie England gave birth. News stories about that may have brought pictures of her into the Google News database -- and thus into Google Images. Similarly, sidebar stories on the Abu Ghraid tortures or any other stories like this that may have hit Google news would have done the same. But as that news became old, those pictures fell out of Google News -- and out of Google Images.

The Google Images staleness prompted Yahoo's Jeremy Zawodny to suggest Google might want to outsource image search results from Yahoo. That's unlikely, but Gary Price does give you some fresh image search alternatives here.

Posted by Danny Sullivan at 2:57 PM | Permalink | Comments (0)

September 29, 2004

Google News In Beta To Avoid Ad Lawsuits?

Nice article spotted via Dave Winer, Google News: Beta Not Make Money from Wired. The article theorizes that Google has kept the "beta" moniker on Google News for so long because it's afraid that removing it -- and adding ads -- would cause it to be subject to copyright lawsuits.

Maybe. But the same argument about lifting headlines and lead paragraphs is already applicable to the long-standing Google web search service. No one I know of has seriously sued Google over that.

The reason, of course, is that people want to be listed in Google's web search -- to the degree that they get upset over lost business if there's any major algorithm change, as happened big time last December (see my What Happened To My Site On Google? article).

Given that Google News drives traffic to many publisher web sites (some of which like the New York Times earn off Google's contextual ads program), biting the traffic hand that feeds them seems unlikely to me.

In fact, the New York Times makes another good example, given that the publisher has specifically worked with Google to be accessible in Google News despite its password-protection on some stories.

Still, the two key points remain. Isn't well past time for Google News to come out of beta and when will ads finally appear? I'll let you know what I hear.

By the way, I see hardly any ads at Yahoo News -- a banner here and there and no sponsored listings that I can find. In addition, a keyword search at Yahoo News such as for movies or world series doesn't bring up any sponsored listings, in contrast to the case with regular web searching.

It suggests that if Google has some reason not to monetize Google News, the same reasons may also be happening with Yahoo News -- which in terms of keyword-driven searches, is as automated as Google. For more on that, see: Postscript On Google News & Bias.

Posted by Danny Sullivan at 7:33 AM | Permalink | Comments (0)

See More Posts From:

This Week | This Month

  var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA-564586-7"); pageTracker._setDomainName(".searchenginewatch.com"); pageTracker._trackPageview(); window.collarity_appid = "incmedia"; //> //>

Senior Digital Planner
U.S. International Media Los Angeles, United States

Senior Search Analyst
U.S. International Media Los Angeles, United States New York, United States

Webmaster - Marketing
West Virginia School of Osteopathic Medicine Lewisburg, United States

Web Marketing Manager
Harvard Business Publishing Watertown, United States


0