When it comes to legally challenging tech giants, Aaron Greenspan is on a roll. In March, he won a small claims court suit against Google's AdSense program, which cut him off without warning and without paying him what his site had earned.
It turns out that Greenspan attended Harvard at the same time Facebook founder Mark Zuckerberg did. Greenspan developed a network for the Harvard community called houseSYSTEM. The network included course scheduler, student marketplace, email service, automatic birthday reminder, message boards, photo album, digital flyer advertising, event calendar (with online RSVP's), map integration, job placement, and local business reviews. Greenspan thought about adding profiles, but at the time nixed them for security reasons.
houseSYSTEM included a section called 'The Universal Face Book.'
Later, Zuckerberg would add profiles when he started his social network in 2004. And, of course, he called it Facebook.
Naturally, this led to trademark disputes which have now been settled.
This isn't the first time a classmate of Zuckerberg wanted credit for their Harvard-era work. ConnectU's co-founders (and twins) Cameron and Tyler Winklevoss, along with co-founder Divya Narendra sued Facebook for idea-stealing. That case was settled as well, with leaked reports purpotedly showing the settlement money in the $65 million range.
Posted by Nathania Johnson at 12:05 PM | Permalink | Comments (0)
Walmart has sent a DMCA notice to TechCrunch and SearchAllDeals.com, a shopping search engine and deals aggregator. (Think of it as the Techmeme for deals on the web, with a Google custom search engine to boot.)
Both sites posted some information about "Black Friday" sales for discount giant Wal-mart. But Wal-mart is claiming copyright infringement. It's also saying the info wasn't supposed to be out before November 24th.
The problem is SearchAllDeals doesn't host content. It simply links to it. This amounts to free advertising for Wal-mart.
And since TechCrunch also has the info, then Wal-mart has a leak problem, which is neither TechCrunch or SearchAllDeal's problem.
If I were a competitor such as Target or K-mart, I'd be stepping up to the plate and making the most of this "controversy" by freely offering up my own deals.
h/t TechDirt
Related Reading: Judge Throws Out Copyright Infringement Suit Against Online Video Site Veoh Pro Intellectual Property Act Passes House Google Talks On Its Approach To Content & Copyright
Posted by Nathania Johnson at 4:18 PM | Permalink | Comments (1)
A California federal court judge, Judge Howard Lloyd, has thrown out a copyright infringement suit against online video site Veoh. The suit was brought by adult entertainment company IO Group.
The judge's reasoning was that Veoh is protected by the Digital Millennium Copyright Act's (DMCA) safe harbor provisions. Since Veoh takes quick action in light of copyright issues, they are not acting illegally.
“Veoh has a strong DMCA policy, takes active steps to limit incidents of infringement on its website and works diligently to keep unauthorized works off its site,” wrote Judge Lloyd.
Guess who loves this ruling? Google. It's currently facing its own copyright infrigement suit brought against YouTube by Viacom. Because the Veoh case was heard in California, it doesn't set precedent for the YouTube case in New York. But Google hopes the Veoh ruling is still influential.
via NYT
Posted by Nathania Johnson at 9:16 AM | Permalink | Comments (0)
Remember when Google and Viacom were friends? Ah, those were the days. But not anymore. Over a year ago, Viacom filed suit against Google for the copyright infringment found on YouTube videos. In the latest plot point in the ongoing saga, U.S. District Judge Louis Stanton has ruled that Google can keep its source code secret, but must hand over user logs for the popular video sharing site.
Viacom says it wanted the code to prove that Google could use it to "purposely" find the content in question. Nice try, Viacom. Google's code, of course, is a trade secret. But it's almost a wonder the judge protected the code, because he ruled that Viacom can have access to the user logs. Data to be released includes user names, IP addresses, and videos watched.
Google has often defended its data collection, saying it's not a threat to privacy. It appears the argument worked a little too well on Judge Stanton.
For a history of the Google-Viacom battle, check out these links: Google Fights Back in Viacom/YouTube Copyright Suit Others Join YouTube, Google Copyright Lawsuit Viacom Would Rather Not Sue, Chief Counsel Claims Google to Viacom: Don't Turn YouTube into SueTube
Posted by Nathania Johnson at 10:52 AM | Permalink | Comments (0)
The U.S. House of Representatives passed the Prioritizing Resources and Organization for Intellectual Property Act.despite opposition from the Department of Justice.
The act, sponsored by Reps. John Conyers (D-Mich.) and Lamar Smith (R-Texas), would allow for forfeiture of property such as computers and other equipment used by convicted copyright infringers.
While this is mainly aimed at music and movie piracy and is backed by the entertainment industry, it will be interesting if it could be applied to website content theft. If so, this could create all sorts of interesting developments for the future of the web.
Scrappers and other copyright material thieves could be risking a lot more than dropped Google listings.
Posted by Frank Watson at 8:11 PM | Permalink | Comments (1)
A few hours ago, before I went to bed, I blogged about someone who stole my SEW column content for their own online marketing blog within hours of its publication. I also commented on their blog and asked them to remove the offending content.
This morning, things look different:
The reply starts with an excuse: "my blog has very little content as I am only testing it at the moment" - as if the perpetrator's low readership makes their actions justifiable. It goes on to say that the blogger "didn't mean to leave your part copied article on the site". That's what a small child might be expected to say if caught doing something wrong. It further states "I was surprised to see that Google had indexed it". Isn't that why we blog in the first place? If the content was good enough to post on a public website to promote their business, there should not be any great surprise when Google picks it up.
I find all of this in keeping with the original theft of my content and do not view it to be a satisfying explanation.
However, by the end of the response the person had apologized, removed the stolen content, and promised not to do it again.
Here is what I have done for my part to edit my original post:
Posted by Tim Ash at 12:40 PM | Permalink
Globalization of Content Theft - A Personal Story[This entry has been edited since its original posting. Please read the follow-up post after reading this one.]
I worked for many years to build up my expertise and reputation in online marketing. I considered my recent addition as a "By The Numbers" expert columnist on Search Engine Watch as an honor. My first column on Landing Page Neglect just appeared yesterday and I wanted the whole world to see it.
Unfortunately at least one reader and member of the Global Village did more than that. An unscrupulous so-called online marketing expert in the U.K. stole my column and posted it on his on own blog.
I have asked him to remove it. In case he does, here is a screenshot of the original entry and my comment asking him to rectify the situation.
So what are we to do in this friction-free Internet world? If stealing is as easy as cut-and-pasting, and there is no legal or financial leverage over a thief who is in another country or legal jurisdiction, then what recourse do we have?
I think that one answer may be to use the medium itself against the offenders.
It is easy to steal. But it is also easier than ever to detect such theft, and expose it. It seems like the days of public shaming are a quaint relic of yesteryear. But I vote to bring them back. We should not tolerate liars and cheats in our midst and should use the very medium that enabled the transgression to help rectify it.
Do not do business with [Name edited out] of [Location edited out] - he is not an honorable man.
Please see my follow-up post.
Posted by Tim Ash at 2:04 AM | Permalink
Nielsen will release a service enabling broadcasters and cable networks to control and make money from their online video distribution (per today's WSJ, subscription only). Through fingerprinting technology, the video may be blocked, permitted to load, or "perhaps load only if it is attached to a particular piece of advertising.”
This announcement makes me wonder who holds the keys to video-related ads. With Nielsen acting as a neutral party, I would like to believe the largest rights holders keep control of their ad sales and sources.
However, we can't predict new moves from social networks, such as YouTube. What if the network itself starts to block copyrighted clips, but you want to show your clips and ads? What if the network begins showing ads that somehow interrupt yours? What if you prefer to use the network's ad inventory after all?
Regardless of these unknowns, the Nielsen announcement is interesting news. We'll see who gets real traction in this "video cop" marketplace, and how they charge for or otherwise monetize their services.
Posted by at 2:39 AM | Permalink
Search marketers will almost certainly run into copyright issues at some point in their careers. They may be the victim, finding their own optimized content duplicated without permission and showing up in targeted search results. Or they may be an infringer, stealing copyrighted content from others and finding themselves subject to penalties by the search engines and the courts.
Thankfully, most online copyright infringement issues can be handled with some simple legal procedures. In today's SearchDay, "Copyright Law: What Search Marketers Should Know (Part 1)," Grant Crowell outlines the basics of cease & desist letters, the Digital Millennium Copyright Act (DMCA), and other tactics to help search marketers protect their content.
Posted by Kevin Newcomb at 3:33 PM | Permalink
A Viacom spokesperson called me a few minutes ago, breaking this news, and sending along an official statement. Today, MTVNetworks and its parent company Viacom, are issuing an ultimatium to Google/YouTube: remove unauthorized content or else...
MTVNetworks/Viacom says that over 100,000 unauthorized clips of its video content – representing 1.2 billion video streams - appearing within Google and YouTube, must be removed immediately from its site.
The recent talk of adding short video ads ahead of content on YouTube may have been the last straw for MTV and Viacom, who clearly did not want Google to profit from showing unauthorized clips.
After months of ongoing discussions with YouTube and Google, it has become clear that YouTube is unwilling to come to a fair market agreement that would make Viacom content available to YouTube users. Filtering tools promised repeatedly by YouTube and Google have not been put in place, and they continue to host and stream vast amounts of unauthorized video. YouTube and Google retain all of the revenue generated from this practice, without extending fair compensation to the people who have expended all of the effort and cost to create it. The recent addition of YouTube-served content to Google Video Search simply compounds this issue. Virtually every other distributor has acknowledged the fair value of entertainment content and has taken deliberate steps to concluding agreements with content providers.We have great respect for and loyalty to our audiences. We host more than 130 authorized web sites where millions of fans visit and interact with our content. Our internet portfolio has more visitors than any other entertainment company and we are always seeking distribution relationships to ensure that any of our products and services are easily accessible on every platform.
Our hope is that YouTube and Google will support a fair and authorized distribution model that allows consumers to continue to enjoy our very popular content now and in the future.
Posted by Elisabeth Osmeloski at 10:55 AM | Permalink | Comments (0)
Seems the courts have taken the same view of trademark use in online advertsing as Google has, according to a report at TechDirt.
Eric Goldman's blog that TechDirt refers to stated "The court holds that, as a matter of law, the use of keyword-triggered ads and keyword metatags cannot confuse consumers if the resulting ads/search results don't display the plaintiff's trademarks".
Posted by Frank Watson at 6:32 PM | Permalink
The NY Times reports that Yahoo has recently rejected Google's subpoena for help with the Google Book Search project legal woes. Reportedly, Yahoo turned down Google's request for similar reasons mentioned by Amazon when they turned down the same request. If you are interested, I have posted the full court filing at my server as a PDF download.
Posted by Barry Schwartz at 9:11 AM | Permalink
This week, news emerged about an agreement between Google and two Belgian author groups that were suing it over copyright issues. Below, a short Q&A on what this means for Google. Highlights: The case goes on with three other groups taking part, but large damages seem unlikely. The new deal gives especially seems to give Google photo rights. Google says it is not doing an about-face on opt-out in Denmark. More about these an other issues covered below, based on a talk with Google spokesperson Jessica Powell. Plus, some bonus stats on how much traffic newspapers get from search engines.
Q. The case was originally filed against Google by Copiepresse. What are the other groups that joined and when did they come on?
A. In mid-October, Sofam, Scam, SAJ and Assucopie all joined the case after Google posted the Belgian court ruling in late September.
Q. Who remains as part of the case?
A. Copiepresse, SAJ and Assucopie.
Q. Has Google paid any fines in the case so far?
A. Despite rumors, Google reiterated again today that it has not been asked to pay any fines.
Q. If Google loses the case, will it have to pay any damages?
A. Google says it hasn't been asked to pay any fines.
Q. What do the new agreements with the author groups Sofam and Scam allow?
A. Sofam represents Belgian photographers while SCAM covers mainly audio/video content. Exact uses are being worked out. As with the AP deal, Google highlighted this as providing new uses rather than a solution to the legal challenges over spidering and thumbnail image use. "It's a way for us to use their content in new ways beyond what copyright law currently allows us without the permission of the authors," said Powell said.
Q. Was there a financial aspect to the agreement?
A. Google's not commenting. Google is definitely paying the Associated Press to use some of its content, as the AP itself has reported. However, the exact terms, mechanisms or amounts have never been disclosed. Google wouldn't get into specifics on the financial details on the two Belgian deals other than to say these were deals that will allow the search engine to use the content in new ways.
Q. Is Google talking with the other parties to the suit?
A. Google said it won't comment on discussions but that it's always open to dialogue.
Q. Did Google reverse course and go opt-in for Google News Denmark?
A. Google says it chose to only launch in Sweden and Norway and that going forward it is not planning on an opt-in model in Denmark or elsewhere. The reason, says Powell, is that the company believes Google News complies with copyright law. "If publishers don't want their websites to appear in search engines, robots.txt enables them to automatically prevent their content from being indexed. And we even go beyond that: if a newspaper doesn't want to be a part of Google News, they only need to ask, and we remove them."
Between The Lines Time
The use of news images is one of the touchiest areas for Google to deal with, as I covered more in my Search Engines, Permissions & Moving Forward In Copyright Battles article.
The Sofam deal might help solve some of Google's legal issues in Belgium. The group represents the rights of nearly 4,000 photographers in Belgium, Google said. Google did NOT say how this might translate into usage at Google News. However, potentially this means Google can have photos in Google News even from publication that it had to remove from Google Belgium by court order. The Sofam deal might provide legal cover there. Of course, if those publications are the only source of certain photos -- and they block use through systems like robots.txt -- that would still keep the content out of Google. I'm also following up more on this particular issue.
The deals do not restore access for Google to list textual news stories it finds. That means it has to remain hopeful that the legal case will go its way, if it wants to prevent some type of negotiations with the publishers that have opted-out.
If the case goes against Google, it doesn't appear to be facing in major damages. If these were to be levied, that should have happened when it lost the first time. Instead, the publishers will remain out of Google, making Google News Belgium less useful than it would be. However, they also deny themselves traffic from Google. Possibly Google might negotiate a payment-based system to include them. Equally possible, it might also decide to hold its ground and focus attention on other countries, to see if it can wait the publishers out.
If the case goes for Google, then it regain content that will help enhance Google News Belgium, unless those publisher decide to specifically block spidering, which Google would almost certainly honor.
Overall, the action in Belgium -- as with Denmark -- underscore that in smaller markets, Google (and other search engines) may come under increasing pressure to negotiate deals to list material. The players are fewer and have more power concentrated among them. Whether these will be lucrative deals remains to be seen. In smaller markets, Google might decide it's simply not worth figuring out some type of financial arrangement -- especially for Google News which carries no ads, so generates no direct revenue. That might bring about more non-financial arrangements where the publishers cooperate for the benefit of getting traffic and also being dealt with personally by Google, rather than impersonally through automated permissions systems like robots.txt
Traffic To Search Engines
As an aside, I got a request from another reporter trying to understand how much traffic newspapers get from search engines. My response:
There's no specific answer to this. It will vary from paper to paper. Places like the New York Times will likely get a lot, because they specifically work to generate search traffic. Papers such as those suing Google in Belgium are getting probably nil, since they were removed by court order from Google.
In general, surveys have found sites getting anywhere from 8 to 13 percent of traffic from search engines. That might not sound like much, but often the first visit leads to repeat visits.
I also included two people on my response who I thought might have some better stats. Marshall Simmonds, chief search strategist for the New York Times Company, came back with this:
The one stat I can report is the NYT gets approximately 22% of its traffic from search engines. This number is very actively growing.
Bill Tancer, over at Hitwise, reported this:
Hitwise tracks 800,000 sites divided into 170 industry categories. One of those categories is our News & Media – Print category which covers Newspaper and Magazine websites (3,180 sites total). For the week ending 11/18/06 (based on our U.S. sample), Google was the #1 site sending traffic to the category at 13.66%, Search Engines as a whole were responsible for 22.44% of traffic for that same week.
That's a lot of traffic, however you slice it. There's no doubt things like Google News help build Google up as a company. But at the same time, Google News drives a ton of traffic to newspapers that are seeing the web as a new revenue source that might save them as print subscriptions dry up.
Posted by Danny Sullivan at 12:35 PM | Permalink
Via Techmeme, news that Google has settled with two Belgian publishing groups involved in a lawsuit against it over content included in Google News Belgium. This comes a day after Google's legal case was reheard in an appeal. The settlement, following what seems a similar settlement with AP earlier this year, seems to open the door that Google is going to continue making such appeasements rather than fight cases in court.
Bloomberg reports that Google struck an agreement with Sofam -- which represents Belgian photographers -- and Scam, which represents Belgian journalists. The agreement allows for Google to use content from these groups (or from their members). Whether they are being paid for this, what content or how it will be used is not explained:
"We reached an agreement with Sofam and Scam that will help us make extensive use of their content," Jessica Powell, a spokeswoman for Google, said in a phone interview yesterday. She declined to give details of the agreement or say whether it involved paying the groups for the content, and declined to say whether Google, based in Mountain View, Calif., was considering similar accords with the newspapers.
In September, Google lost a copyright case filed against it by another Belgian publishing group, Copiepresse. Google later had to post the ruling against it on Google Belgium. However, Google was granted an appeal for the case to be reheard, as it hadn't been represented in court the first time. The stories below provide more background on all of this:
At some point, Sofam and Scam joined in the case. I see one reference to this back in October. Two other groups also apparently joined, since the Bloomberg report speaks to the settlement being with two of five total parties to the suit.
Those parties, led by Copiepresse, continue on in their action against Google. That action, as I've covered in my Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers article, is far more about trying to pressure Google into a financial arrangement to use Belgian news content than keeping that content out of Google itself. If it was just to keep content out of Google, the publishers could have easily done this through methods such as using robots.txt files.
Copiepresse seems confident of a legal victory:
Speaking on the phone from Brussels after the hearing, Margaret Boribon, the Copiepresse secretary-general, said she felt very happy with how things proceeded today. "I can't see how the judge could change his opinion,'' she said, certain that the court will uphold the September ruling.
Perhaps that legal victory will come, when the ruling is issued in late December or January, when expected. If so, it may not help Copiepresse in the real aim of a financial deal. Google may have enough content to make Google Belgium viable without the participation of the papers Copiepresse represents. They'd then be left in a situation of asking Google for reinclusion or going without the substantial traffic Google News can send web sites.
On the other hand, Google's settlement with the groups following on an agreement earlier this year with the Associated Press seems likely to fuel further publishing groups pushing for such arrangements, especially in smaller markets where key content is put out by a small set of publishers. Banding together and sticking with exclusion, they can severely hamper a news search service.
Norway Upset With Google News Over Copyright Laws covers how Google is being challenged in Norway. That hasn't developed into a legal case yet, but it's hard to see how Google's going to be able to say no to some type of agreement there. Pandia also covers how in Denmark, publisher opposition apparently created the unprecedented case of Google asking for permission to index news sites, rather than the normal case of spidering and requesting an opt-out.
Search Engines, Permissions & Moving Forward In Copyright Battles from me covers how in particular, Google's use of images for its news area is complicates issues and is making it harder for search engines in general to defend opt-out spidering, which I support. That article calls on Google to stop the inclusion of news images, as well as a pullback on showing cached pages and scanning of in copyright works without permission.
However, asking for permission to spider textual content for news search is likely to be as slippery a slope as cutting deals with publishers. It weakens the core legal position Google has argued over gather textual content from the web, most recently against suggested copyright changes in Australia that it said might make search engines unworkable.
As a reminder, Microsoft was also challenged in Belgium. Microsoft Removes Belgian Content Without Court Order covers this more and how Microsoft's reaction was to drop those publications. So far, it hasn't apparently cut a deal for reincluding them and perhaps may not feel a market need to do so.
Judge Gives AFP Case Against Google More Time covers how a copyright case against Google but Agence France Press over news inclusion is still ongoing.
I plan to follow up with Google Monday and see what further details I can gather on the case. I don't expect terms to be disclosed, but it would be good to know if a financial arrangement of some type was reached. That happened in the AP case, though Google was adamant the agreement there was not to allow it to solve a legal problem with spidering.
Many saw this as spin. There are other things the agreement would give Google aside from the right to spider, as my Google-AP Deal Not Pay-Per-Click & Some Further Details covers in more detail. However, it also conveniently solved the spidering issues for Google.
Postscript: See Q&A On Google's Belgium News Agreements for more on this story since it was written.
Posted by Danny Sullivan at 5:04 PM | Permalink
Reuters reports Google France was sued by Flach Film, a French film producer, for copyright infringement. They claim their video, "The World According to Bush," was published on Google Video France, and viewed more 50,000 times, before Google removed the video. The French film producer estimates $648,700 in prejudice but Google said "our terms and conditions specify that users (Internet surfers) don't have permission to use videos which they don't own the rights to."
Google has put away $200M for copyright case legal issues with the YouTube acquisition.
Posted by Barry Schwartz at 9:05 AM | Permalink
Google To Go To Belgium Court FinallyThe AP reports that Google is finally going to show up in court to present their side of the case in the Belgium copyright suit. Google has never showed up to fight the publishers and papers in Belgium the first time the case was heard.
Posted by Barry Schwartz at 8:58 AM | Permalink
Melanie Colburn writes that Music Labels Lose Copyright Suit Against Baidu, which started back when Five Music Companies Sue Baidu in September of 2005. Baidu was previously ordered to stop these music downloads but it appears the ruling was overturned because all Baidu is providing are links to 3rd party sites that facilitate the music downloads, whereas Baidu does not participate in the downloads themselves. More details at the BBC News.
Posted by Barry Schwartz at 9:37 AM | Permalink
Pandia reports that Google News is in trouble again over copyright laws overseas. Google News Norway was launched and publishers are upset that Google is placing copyrighted images in the Google News home page. Mediebedriftenes Landsforening, an association of Norwegian media companies, claims Google "cannot make use of photographs without a proper agreement." This form of syndication is in "violation with Norwegian copyright law," says Dagens NÊringsliv.
Google is also in trouble over copyright issues in Belgium (also see here and in Australia.
Posted by Barry Schwartz at 9:04 AM | Permalink
First Google was rumored to be keeping $500 million back from the YouTube sale to settle possible legal problems. Then Google CEO Eric Schmidt said they weren't. Today, turns out they are. Google holds back stock in YouTube deal from the Associated Press covers the details about keeping 12.5 percent of the stock swap for one year "to secure certain indemnification obligations." What Eric Schmidt Meant When He Said Google Wasn't Holding $500 Million From YouTube For Lawsuits: We're Holding $200 Million from TechDirt does a summary, plus gives you a funny headline about the entire thing.
Posted by Danny Sullivan at 10:12 AM | Permalink
A Struggle Over Dominance and Definition is good New York Times article out today that looks at Google and whether it is a media company that conflicts with other media owners, especially in terms of using content from others without permission. It also sparked me to finally finish a long piece I've been meaning to do on Google, search engines and copyright issues. Search Engines, Permissions & Moving Forward In Copyright Battles is now up over at my personal blog Daggle, covering the important difference between indexing and reprinting, how robots.txt already provides a permissions system, why Google should stop scanning in-copyright books and also be a leader in dropping cached pages.
Posted by Danny Sullivan at 11:58 PM | Permalink
AFP reports that Google has warned Australia that if they pass certain a new copyright law that it will set the country back to "the pre-Internet era." Google's senior counsel, Andrew McLaughlin, told the Senate Legal and Constitutional Affairs Committee, "If such advanced permission was required [to index pages], the internet would promptly grind to a halt." I believe the issue here is that Australia wants Google to get copyright owners to opt in to having their content indexed, archived and cached, as opposed to opting out via a robots.txt file. Australia is not alone here; Belgium newspapers are fighting Google over similar copyright issues. This all just amazes me, seriously.
Postscript From Danny: See also my Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers piece that goes into great depth about how this is effectively already the law in Belgium, due to a court ruling there. The appeal on that case will happen later this month, but the threat alone also already caused Micrsoft to back out of some indexing.
Posted by Barry Schwartz at 10:29 AM | Permalink
The Financial Times reports that Eric Schmidt's Google is running from media company to media company trying to offer upfront cash, in sums of "tens of millions of dollars," to slow and "halt" the threat they pose to YouTube. FT.com says that Schmidt met with CBS, Viacom, Time Warner, NBC Universal, News Corp and others recently. There are some more details over at paidContent.
Posted by Barry Schwartz at 9:27 AM | Permalink
Elinor Mills reports that Google has denied a report last week that it was fined $43 million for not removing all Belgian publishers' content from the engine's index and cache. Google spokesman Ricardo Reyes, told Elinor Mills at News.com in an email, "Google has complied with the Copiepresse judgment and we are not aware of any fine. We believe this story to be completely untrue."
Posted by Barry Schwartz at 8:17 AM | Permalink
Gary Price points to a Poynter.org report showing that Google has been fined €34 million (about $43,231,000 USD) for not removing all of the Belgian publisher's content based on a court ruling. Google claims they could not find all the publishers and asked the publishers for help in identifying the content that has to be removed.
Postscript: Google Says Belgium Did Not Receive $43.2M Fine.
Posted by Barry Schwartz at 9:42 AM | Permalink
More Details On YouTube & Google AcquisitionBlog Maverick has some intimate details on the Google YouTube Deal from a "trusted anonymous author" in a message board. Here are some of the excerpts:
The first request was a simple one and that was an agreement to look the other way for the next 6 months or so while copyright infringement continues to flourish. The second request was to pile some lawsuits on competitors to slow them down and lock in Youtube's position. Infringement lawsuits will be served on Youtube and the new proud parent Google in the coming months. Google will respond with two paths: an expensive legal fight or a quick and easy settlement with most choosing the latter.Posted by Barry Schwartz at 9:26 AM | Permalink
The Register writes Microsoft dodges court in Belgian copyright battle where they say Microsoft decided not to go to court over Belgian newspapers request for them to remove their content from their index. Google was ordered to remove the content by a Belgian court and then later lost an appeal on the same case. Microsoft simply did not want to fight them and decided to just grant the wishes of the cease and desist letter sent to them.
Posted by Barry Schwartz at 9:08 AM | Permalink
Amazon Turns Down Google's Request For Information On Book SearchBusiness Week reports that Amazon has turned down Google's request for information to help in it book scanning lawsuit. Amazon responded to Google's subpoena saying, that it would make Amazon's trade secrets public and it was "overly broad and unduly burdensome" on Amazon. In short, it is Amazon's way of telling Google to stop looking over their shoulder and work it out yourself.
Posted by Barry Schwartz at 8:29 AM | Permalink
The NY Times has an extensive article today on Google and those who would challenge it in the courts. It offers a broad overview of the legal issues surrounding Google, including those coming with the YouTube acquisition, and the company's attitude toward litigation, which is typically to fight rather than settle.
In addition, Charles Cooper at CNET writes what can only be described as an angry column about Google and "Web 2.0," content and copyright infringement. The article is entitled, "Web 2.0 as a metaphor for 'rip off'.
Posted by Greg Sterling at 12:22 PM | Permalink
Reuters reports that YouTube erased 29,549 films and media files after receiving a complaint from "Japanese media companies over copyright infringement." Around the same time, the NY Times informs us that Music Companies Grab a Share of the YouTube Sale. The article says that the $50 million earned from this deal "should help to shield Google from copyright-infringement lawsuits." Universal Music last week sued two smaller video sharing sites but not YouTube, for distributing pirated music and videos. Techdirt feels that the last minute deal with the music companies before Google buying them was YouTube basically handing over to "the labels Google's cash before any official deal was completed."
Posted by Barry Schwartz at 8:18 AM | Permalink
MarketWatch reports that a judge has consolidated two different cases against Google to make the process quicker and more "streamlined." Book publishers and book authors have joined together to battle Google on the legal from for copyright infringement allegations over Google's Book Search Project.
Postscript: Steve Bryant at eWeek reports that the Authors Guild v. Google case is postponed six months to January 2008. Steve said, "Doesn't that mean that Google, in the meantime, will continue to operate Google Books as normal, which is exactly what the Authors Guild wants to prevent?"
Posted by Barry Schwartz at 2:15 PM | Permalink
Earlier, we touched on the fact that Copiepresse was threatening to go after MSN for carrying Belgian newspapers in the way it went after Google. Via PaidContent.org, Update: MSN is latest target of Belgian copyright complaint from InfoWorld covers how Copiepresse is now negotiating with MSN Belgium after sending a cease-and-desist letter to MSN. Copiepresse hopes to gain a share of advertising revenue.
Meanwhile, MSN Belgium has removed some newspapers. Removed from where isn't clear. MSN Belgium does have a dedicated news area, so it might be from there. However, sites may also have been removed from web search results similar to what Google did. I tried a search for site:lesoir.be, and the main news site seems to have been removed.
InfoWorld also notes:
The group, which represents some of Belgium's best known newspapers, including Le Soir and Le Libre, has been gathering more support for its cause. It was joined this week by separate groups that represent Belgian photographers, journalists, scientific authors and multimedia publishers, who plan to back its efforts.
It will be interesting to see how many more groups they rally in support against the search engines, and how the search engines react. I think there's a big difference between search engines deciding they might pay to include relatively small amounts of content in specialized news search engines versus a frankly insane idea that they're going to negotiate deals for inclusion in regular web search results.
Ultimately, the good people of Belgium might mind themselves without the ability to search the web, should Copiepresse succeed in its quest that getting permission via robots.txt should be illegal.
I've have much more to say on this subject -- I'm working on a piece I hope to post later this week. For some related material from me, see:
Posted by Danny Sullivan at 8:56 AM | Permalink
Google faces copyright fight over YouTube from The Guardian cover how chair and CEO of Time Warner Dick Parsons said his company plans to go after YouTube for copyright violations. It's still talk rather than legal actions:
Mr Parsons told the Guardian: "You can assume we're in negotiations with YouTube and that those negotiations will be kicked up to the Google level in the hope that we can get to some acceptable position."
I'm sure it will get kicked up. And it shouldn't be hard to get the right people connected given that the AOL part of Time Warner already has an existing distribution deal with Google. Of course, if that fails, it should be interesting to see if Time Warner sues a copy that has a five percent ownership stake in AOL. Related coverage and commentary can be found via Techmeme, here.
Posted by Danny Sullivan at 8:35 AM | Permalink
Sean Daly, from Groklaw, interviewed Margaret Boribon of Copiepresse on September 28th about their copyright lawsuit against Google, which targets the use of Belgian news in Google News, and cached copies of those articles. He has posted their discussion, in English and French, as well as some commentary and analysis of the litigation, including some late breaking news involving demands made by Copiepresse for MSN, and a potential new plaintiff.
I've written a brief synopsis of some of the points she raises in the interview at SEO by the Sea. Danny also talked with Margaret Boribon earlier in September.
Posted by Bill Slawski at 12:32 PM | Permalink
Ballmer: YouTube Overvalued & Google Transferring Wealth From Content OwnersThe Web According to Ballmer from BusinessWeek has Microsoft CEO Steve Ballmer questioning the value of the Google-YouTube deal and oddly warning that Google is transferring wealth away from rights holders. It's an odd statement, since that's what Microsoft wants to do as well.
First the questioning of the YouTube value:
[You've got to ask] could Google do whatever it is they're hoping to buy without paying $1.6 billion? Is YouTube really some permanent, long-term thing, or is it a fashion?....Right now, there's no business model for YouTube that would justify $1.6 billion.
Though strangely, when BusinessWeek tries to pindown what seems a clear statement that Google overpaid, Ballmer says:
I'm not saying it is overvalued. I'm not trying to say that. It depends on a set of factors. I'm not saying I wouldn't write a check for that amount of money. I might.
And back to the controversial statement about Google's relations with content:
And what about the rights holders? At the end of the day, a lot of the content that's up there is owned by somebody else.
The truth is what Google is doing now is transferring the wealth out of the hands of rights holders into Google. So media companies around the world are all threatened by Google. Why? Because basically Google is telling you how much of your ad revenue you get to keep. They better get some competition. Us. Yahoo! (YHOO). Somebody better break through or you can short all media stocks right now. As long as there are two, you can hold onto media stocks. Google understands that. And that's one reason why they're willing to lose money up front.
Microsoft has its own video sharing service up, Soapbox. It has a question answering service, Q&A. It has an entire search engine that crawls the web like Google, Windows Live. Microsoft has plans for contextual placement of ads on pages, similar to AdSense. It's specific to MSN content now, but that will inevitably change. All of these things leverage the content of others in order to make money from Microsoft. So if these actions leverage wealth away from content owners, Microsoft is just as guilty of it as Google.
Frankly, all Ballmer seems to be saying is content owners would be better off if Microsoft was a strong third participant in ad game. Sure -- but let's not kid ourselves. Microsoft gets a lot better off by that as well, and it didn't jump into the game out of some desire to counter-balance the power of Google. It's in it to make as much money as it can, as well.
Posted by Danny Sullivan at 7:42 AM | Permalink
Just in from Bloomberg, Google to Subpoena Yahoo, Microsoft on Book Scanning covers how Google hopes that gaining information from rival book scanning programs will help it defend itself in copyright lawsuits over its own scanning program. From the story:
Google, which doesn't disclose how many books it has scanned, also wants to know the title, authors and copyright status of books already offered through competitors' book projects, according to the documents.
The right to subpoena has been granted, but information is to be kept confidential and used only in the litigation.
Posted by Danny Sullivan at 8:09 PM | Permalink
News.com has another great article named Copyright tussles for Google. It reviews some of Google's copyright cases and how Google is trying hard to win some of those cases for their current and future projects. From the Google Cache, to Google Images, to web search, book search and other indexing projects - Google needs to keep redefining the law to continue to build out their search engine. But you have to agree with the highlighted quote, "One of the challenges is, 'This is Google. What would the world be without Google?' We don't want the world without Google. We want the world without Google infringing our copyrights."
Posted by Barry Schwartz at 9:02 AM | Permalink
Last week, Google complied with a Belgian court order and posted the ruling against it in a copyright suit on the home page of Google Belgium and Google News Belgium, along with many other places including many search results pages. Now via Google Blogoscoped, news that the plaintiff in the case Copiepresse thinks the ruling should have gone at the top of the Google News Belgium page, rather than the bottom.
An article about the issue in Dutch is here. I don't speak Dutch, sadly, consigning me to AltaVista Babelfish, which translated a key part as:
That happened also, but on the start page of Google news, the topicality part of the site, stands the sentence entirely below. And that does not like Copiepresse.
Anyone hitting Google Belgium couldn't have failed to notice the beginning of the very long ruling, as the illustration above shows. But over at Google News Belgium, that ruling wouldn't have been seen unless you scrolled to the bottom of the page, past all the stories. That's what Copiepresse seems to be upset about.
The order did require that:
The defendant to publish, in a visible and clear manner and without any commentary from her part
Copiepresse might well be able to argue that on Google News Belgium, the ruling there wasn't clear and visible by being at the bottom of the page.
Of course, putting the long ruling at the top of the page would have been unworkable. The ruling itself didn't allow Google to put anything on the page directing people to see the notice at the bottom since that might have been deemed "commentary" about the ruling.
What next? If Copiepresse presses for more and wins, perhaps Google might have to run the ruling in a column alongside news content.
Frankly, Copiepresse comes across as petty in complaining here. Google already had a good argument that publishing the ruling was unnecessary given the wide press coverage the ruling had gained, though the court was not convinced and required the ruling to go up anyway. After that happened, coverage of Google's loss was only magnified. The point was made very publicly.
Posted by Danny Sullivan at 9:22 AM | Permalink
Our approach to content at the Official Google Blog has Google explaining to the world how it works with content owners and its desire to respect their rights.
In terms of copyright, Google stresses that it generally sticks to what's known as fair use, though the post doesn't use those words. The idea is that it shows very short summaries of stories, pages, thumbnails of images but doesn't reprint this material, requiring people to clickthrough to the actual material from places like Google News.
Of course, in the case of cached pages, many including myself would argue that Google goes beyond fair use. Cached pages are an example where content can be viewed without clicking through to the original site, and the opt-out approach for that doesn't feel appropriate at all.
Google also notes there are cases when it wants to go beyond fair use, to make broader use of content where permission would be required. The deal with the Associated Press is cited as one of several examples here.
To me, this is also a way for Google to help defuse the idea that some publications have, such as the Belgian newspapers recently, that Google can be bought off to avoid lawsuits. To me, this is Google stressing that it will do content deals in some cases, but that these content deals aren't necessarily being done to avoid lawsuits, especially when it feels it is acting within fair use guidelines. That's my speculation and take on this, of course. Google didn't comment when I asked if this was the reason for raising the AP deals.
Moving past Google saying it respects copyright, it then stresses that it allows people to opt-out, even if it feels it has fair use rights. In general, I agree with this method, which Google along with the other major search engines generally follow. Trying to get permission from each web site to index it would be an impossible task, and one that's not necessarily even legally required. Opt-out through things like robots.txt is an effective way to protect rights holders plus benefit the public as a whole. I do hope they'll change cached pages to opt-in, however.
Google talked with me about the post shortly before it went live yesterday, to see if I had any questions. The main thing in my mind was if this was in response to the Belgian lawsuit. No, I was told. The post has been in the works for some time, apparently. Google's hoping it will help people better understand their approach to content.
Posted by Danny Sullivan at 7:56 AM | Permalink
Just a quick note that Google's posted on its official blog about the Google Belgian news issue that I've been covering, while William Slawski has a nice translation in the works on the ruling itself.
About the Google News case in Belgium from the Official Google Blog doesn't really provide much new information that you haven't already gotten in reports from me and others. What should it provide? How about answers to:
The post does stress that there are ways for publishers to easily stay out of Google. Those ways don't appear to have been presented to the court itself. Writes William Slawski in Belgian Copyright Ruling Against Google News:
I'm surprised by the lack of mentions of the use of a noarchive meta tag or noindex meta tags or by the use of robots.txt to disallow Google from indexing or archiving the pages of the newpapers in question.
While the Court does note that the onus of keeping copyright from being infringed falls upon the owner of the technology used to take text from the newspapers in question, this seems like an omission worth noting.
Regardless of how the Court may have felt about those options, I think that they should have been addressed in some manner. The failure to do so makes it appear that they either weren't provided information about those by their expert, or didn't understand them, or may not have addressed those issues on purpose.
A simple noarchive tag would have kept information on those pages from being cached by Google. A noindex tag or disallow directive should have kept their pages from being indexed at all by Google. Were they using these and Google ignored them? I suspect that they weren't.
After some more analysis, including an important argument over whether Google is a portal competing with newspapers or a search engine (answer, in my view, probably both depending on whether you keyword search Google News or read by browsing), he provides a long and what seems fairly complete English translation of the French-language ruling.
For more background on the case, see my prior posts:
Posted by Danny Sullivan at 9:03 AM | Permalink
Google has now posted the text of a Belgian ruling finding it violated copyright on the Google Belgium home page. The ruling has also been posted to the home pages of Google Images Belgium, Google News Belgium but not Google Groups Belgium.
Last week, a court ruled Google had violated the copyright of several Belgium newspapers by listing them within Google News. The court ordered the removal of those papers from Google, which the company quickly complied with.
The court also ordered Google to post the ruling on its Belgian web site within 10 days or face a heavy fine. Google appealed that punishment, but it was upheld last Friday.
Despite losing its appeal, Google looked ready to defy the order to post the ruling and take the fines, until a second appeal could be heard in November. Now, the company has reversed course. The ruling went up on Saturday. The company gave no reason for the reversal to Reuters:
A spokesperson for Google declined to elaborate on the reasons that made the company change its mind but said it would seek to cancel the ruling.
"We are pleased that a judge has given Google the opportunity to appeal the substance of this case. This will be heard in November," the spokesperson said.
From Dow Jones newswire:
Google spokeswoman Rachel Whetstone told Dow Jones Newswires the company had agreed to publish the ruling on its Web site after studying the court judgment.
Technically, Google never failed to comply with the court ruling. It has 10 days from receipt of the ruling to act, and it has done so within that time, saving it exposure to fines. As noted, a second appeal on the ruling will happen in November.
Past coverage is below:
Also, I note that Microsoft's Windows Live is now operating illegally under Belgian law. For example, site:www.lesoir.be shows how pages from Le Soir -- one of the publications involved in the lawsuit against Google -- has pages listed in Windows Live, as well as cached pages. In fact, here's an example of an article from Le Soir about the Belgian ruling against Google that I can read at Windows Live through its cached copy. To date, no news that Microsoft is about to be sued.
Finally, over at Threadwatch, an interesting comment points out that Google might have been OK in Belgium if it didn't show cached copies of pages:
The truly critical essence of this Belgian court ruling concerns Google's caching functionality. Here, protected content is being displayed a) in modified form; b) more often than not in its entirety (i.e. not restricted to mere snippets); and c) without copyright holders' permission. In most countries this would be viewed as a flagrant violation of copyright law - and obviously this is the stance the Belgian court has adopted. (And yes, there's been a contrary ruling by a US court, but that specific case seems to be rather more complicated on closer view; also, there's some indication that it was decided on arguably faulty assumptions, but that's another story.)
It is interesting to note that the Belgian ruling specifically acknowledges Google's right to store third party content (no mean concession, that, and far from self-evident) for search purposes only. But displaying it in the cache for everyone to see constitutes an act of re-publication which, like it or not, demands copyright holders' express permission.
This is a very important point. Search engines make copies of pages in order to make content searchable, as my Indexing Versus Caching & How Google Print Doesn't Reprint article explains in more detail. It's very difficult to argue this type of copying harms a site owner, especially when opting out is so easy.
Showing these actual copies through cached pages has long been disturbing for many people. While it's easy to opt-out of such display, it feels a step beyond what a content owner should have to do. With cached pages, content is literally being reprinted rather than made searchable. It seems absurd for the content owners to opt-out in that instance.
Within the US, cached copies has so far been upheld, something I disagree with. But if Google were to eliminate them -- along with picture thumbnails -- it sounds like it might have a better chance of winning in Belgium.
Posted by Danny Sullivan at 5:51 AM | Permalink
Google loses appeal on posting court ruling from Reuters covers Google losing an appeal that it should not be required to post the ruling of a Belgian court over a copyright infringement lawsuit on its Belgian web search and news sites. It now will be fined 500,000 euros per day for each day it fails to comply. Google has a further appeal on the entire case, including posting the ruling, that will be heard in November. My past article Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers has more about that and the entire case.
Posted by Danny Sullivan at 12:10 PM | Permalink
Publisher Groups To Test New Search Engine Rights Management System (Updated)Several mostly print publisher groups say they are to test a new "Automated Content Access Protocol" that they feel will head off conflicts with search engines. A release with more information is below.
Exactly how the system will work, why it is different or better than existing systems like robots.txt or meta robots tags, isn't explained. More details are promised to be unveiled at the Frankfurt Book Fair on October 6.
I'm planning to talk with the World Association Of Newspapers to learn more about their plans next week, so I may have more before the formal unveiling. I've had a very informal talk already, and the view seems to be to find a way to make the existing systems work better. That's appreciated, and it's something the search marketing community has long wanted. But it's something I hope will involve more than just a group of publishers with mostly print interests.
My Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers article from earlier this week explains how in my view, the entire issue that has erupted in Belgium is less about keeping content out of search engines and more about trying to force them to pay publishers for inclusion. Right now, any publisher that feels copyright is somehow infringed by being in a search engine has a very easy, very selectable way to keep whatever they want out: robots.txt files or meta robots tags. These work on a web-wide basis, have support of all the major search engines, plus have been used by users from publishers of all types. They could definitely be improved -- but in the Belgium case in particular, using them would have solved the exact problem that was raised.
Here's the release:
GLOBAL PUBLISHERS HEAD OFF LEGAL CLASH WITH SEARCH ENGINES: NEW RIGHTS MANAGEMENT PILOT IMMINENT
In the week that the publishers of Le Soir and La Libre Belgique won their case in the Belgian Courts against Google for illegally publishing content on its news service without prior consent, the World Association of Newspapers (W.A.N.), the European Publishers Council (E.P.C.) the International Publishers Association (I.P.A.) and the European Newspapers Association (E.N.P.A), are preparing to launch a global industry pilot project that aims to avoid any future clash between search engines and newspaper, periodical, magazine and book publishers.
The new project, ACAP (Automated Content Access Protocol), is an automated enabling system by which the providers of content published on the World Wide Web can systematically grant permissions information (relating to access and use of their content) in a form that can be readily recognised and interpreted by a search engine “crawler”, so that the search engine operator (and ultimately, any other user) is enabled systematically to comply with such a policy or licence. Effectively, ACAP will be a technical solutions framework that will allow publishers worldwide to express use policies in a language that the search engine's robot “spiders” can be taught to understand.
Gavin O'Reilly, Chairman of the W.A.N., said: “This system is intended to remove completely any rights conflicts between publishers and search engines. Via ACAP, we look forward to fostering mutually beneficial relationships between publishers of original content and the search engine operators, in which the interests of both parties can be properly balanced. Importantly, ACAP is an enabling solution that will ensure that published content will be accessible to all and will encourage publication of increasing amounts of high-value content online. This industry-wide initiative positively answers the growing frustration of publishers, who continue to invest heavily in generating content for online dissemination and use.”
Francisco Pinto Balsemão, Chairman of the E.P.C., said: “ACAP will unambiguously express our preferred rights and terms and conditions. In doing so, it will facilitate greater access to our published content, making it more, not less available, to anyone wishing to use it, whilst avoiding copyright infringement and protecting search engines from future litigation.”
ACAP will be presented in more detail at the forthcoming Frankfurt Book Fair on 6th October and will be launched officially by the end of the year. W.A.N., the E.P.C. and I.P.A. will run the pilot for a period of up to 12 months and it will be managed by Rightscom Ltd.
===
The European Publishers Council is a high level group of Chairmen and CEOs of European media corporations actively involved in multimedia markets spanning newspaper, magazine and online database publishers. Many EPC members also have significant interests in commercial television and radio.
The World Association of Newspapers groups 72 national newspaper associations, individual newspaper executives in 100 nations, 13 news agencies, and nine regional press organizations, representing .more than 18,000 publications in all international discussions on media issues, to defend both press freedom and the professional and business interests of the press. The International Publishers Association is a Non Governmental Organisation with consultative relations with the United Nations. Its constituency is of book and journal publishers world-wide, assembled into 78 publishers associations at national, regional and specialised level. The European Newspaper Publishers' Association – is a non-profit association currently representing 5 100 national, regional and local newspapers. These daily, weekly and Sunday titles are published in 24 European countries where ENPA's members are operating in their national markets.
Postscript: I've just received this briefing paper that explains more. I've skimmed it and attached one note marked in bold. Basically, the existing robots.txt or meta robots systems can do a lot of what's already described here. What they cannot do is help search engines access content because the publisher allows this only through a licensing agreement, something the Belgian publishers seem to want. In addition, the pilot can do all it wants. Unless some major search engines agree to cooperate, the pilot will go nowhere. Again, I'll follow up more on this next week after talking with the groups involved.
ACAP Automated Content Access Protocol A briefing paper for publishers on a project in planning 1 Executive summary
All sectors of publishing face a “search engine dilemma”. The value of search engines to users – and to those who publish on the network – is incontrovertible. However, search engine activities can be very damaging to specific online publishing models. The undifferentiated model of permissions management (essentially either allowing or forbidding search of content) is inadequate to support the diverse present and future internet strategies and business models of online publishers.
At the beginning of 2006, the major publishing trade associations established a Working Party, chaired by Gavin O'Reilly, Chairman of the World Association of Newspapers, to consider the issues that this has raised. As a result, the World Association of Newspapers and the European Publishers Council are planning a project which will develop and pilot a technical framework which will allow publishers to express access and use policies in a language which the search engine's robot “spiders” can be taught to understand. This will make it possible to establish mutually beneficial business relationships between publishers and search engine operators, in which the interests of both parties can be properly balanced.
The project is provisionally called ACAP (for Automated Content Access Protocol). ACAP will develop and pilot a system by which the owners of content published on the World Wide Web can provide permissions information (relating to access and use of their content) in a form in which it can be recognised and where necessary interpreted by a search engine “crawler”, so that the search engine operator (and perhaps, ultimately, any other user) is enabled systematically to comply with such a policy or licence.
This paper is intended to brief publishers on the outline of this project and to encourage their active support and participation when the project is launched in September 2006.
2 Background – the “search engine” problem
At the beginning of 2006, the major Europe-based publishing trade associations – including the World Association of Newspapers (WAN); the European Publishers Council (EPC); the European Newspaper Publishers Association (ENPA); the International Publishers Association (IPA); the European Federation of Magazine Publishers FAEP); the Federation of European Publishers (FEP); the World Editors Forum (WEF); the International Federation of the Periodical Press (FIPP) and Agence France Presse – established a Working Party to consider the issues that are posed by search engines for publishers, and to look at ways in which mutually beneficial relationships can be established between publishers and search engine operators, in which the interests of both parties can be properly balanced.
All sectors of publishing have a “search engine dilemma” (even if we disregard the particular problems that book publishers have with mass digitisation programmes). Search engines are an unavoidable and valued port of call for anyone seeking an audience on the internet. Search engines sit between internet users and the content they are seeking out and have found brilliantly simple and effective ways to make money from that audience. They have become so dominant that no individual website owner is large enough to have any serious impact on their commercial fortunes.
The benefits of powerful search technology to both users and providers of content are well recognised by publishers – although even “mere” search functionality can have a negative impact on some publishing business models. At the same time, publishers are aware that search engines are, in following their business logic, inevitably and gradually moving into a publisher-like role, initially merely pointing, then caching and, finally, aggregating and “publishing” and perhaps even creating content themselves, while using publishers' content at will.
In the current state of technology, there can be none of the differentiation of terms of access and use which characterises copyright-based relationships in publishing environments, whether electronic or physical. The search engines can and do reasonably argue that, since their systems are completely automated, and they cannot possibly enter into and manage individual and different agreements with every website they encounter, there is no practical alternative to their current modus operandi.
Whether this (technological and political) gap is there by design or by accident, the search engines are able to make their own rules and decide for themselves whose interests are worth considering.
If publishers are to take the initiative in establishing orderly business relationships with the search engine operators, the response must be to help them to address the problem, both to fill the technical gap and ensure its political implementation. To paraphrase the former copyright adviser to the UK Publishers Association Charles Clark's famous claim that “the answer to the machine is in the machine”, the challenges that are created by technology are best resolved by technology. Since search engine operators rely on robotic “spiders” to manage their automated processes, publishers' web sites need to start speaking a language which the operators can teach their robots to understand. What is required is a standardised way of describing the permissions which apply to a website or webpage so that it can be decoded by a dumb machine without the help of an expensive lawyer.
In this way, one of the search engines' most reliable rationalisations of their “our way or no way” approach will have been removed, and a structure which embraces and supports the diverse present and future internet strategies and business models of online publishers will have been created.
As a result of the work of the Working Party, a proposal was made to develop a permissions based framework for online content. This would be a technical specification which would allow the publisher of a website or any piece of content to attach extra data which would specify what use by search engines was allowable for that piece of content or website. The aim will be for this to become a widely implemented standard, ultimately embedded into website and content creation software.
Following the commissioning of a brief feasibility study, WAN and EPC have taken the initiative to establish a project to develop and pilot this framework to express publishers' access and use policies. A detailed plan for this project – provisionally called ACAP (for Automated Content Access Protocol) – is currently in development.
This paper is intended to brief publishers on the outline of this project and to encourage their active support and participation when the project is launched in September 2006.
3 ACAP – the vision
ACAP will develop and pilot a system by which the owners of content published on the World Wide Web can provide permissions information (relating to access and use of their content) in a form in which it can be recognised and where necessary interpreted by a search engine “crawler”, so that the search engine operator (and perhaps, ultimately, any other user) is enabled systematically to comply with such a policy or licence. Permissions may be in the form of
• policy statements which require no formal agreement on the part of a user • formal licences agreed between the content owner and the search engine operator. There are two distinct levels of permissions which need to be managed within this framework: • The permission given to the search engine operators for their own operations (access, copy and download, cache, index, make available for display) • The delegation of rights given to the search engine operators to grant permissions of access and use to search engine users (search, access, view, copy, download, etc)
Although these can be managed within the same framework, it is important that the differences between them are recognised.
4 Use Cases
We include two informal Use Cases which are illustrative of the type of challenge that we seek to solve through ACAP.
4.1 USE CASE A: NEWSPAPERS
Newspaper publisher A would like all search engines to index his site, but only search engines X, Y and Z may display articles (because they have paid a royalty) on their news pages, and then only for 30 days. All images must be fully attributed as they are in the newspaper. The newspaper publisher uses articles syndicated by other newspapers and news agencies and cannot grant permission for those items, to the extent of the third party rights. Articles should not be permanently cached.
NOTE FROM DANNY: Using existing systems, publishers privileged enough to be included in news search engines don't have their articles displayed. They have links to those articles displayed, along with a description, something that people do all over the web and is generally accepted as fair use. Specific search engines can be blocked, if that's the desire. Specific images can also be blocked. Publishers can require those reprinting their content to install blocks as well.
4.2 USE CASE B: BOOKS
Book Publisher B invites search engine operators X, Y and Z to index the full text of his latest college text books. The web site where the full text is stored should not be made visible to search engine clients. He wishes that search engine users can browse only 2 pages of a maths book, but 20 pages of a philosophy text book. Search engine users should be able to buy individual chapters for private use, at $5 and $3 per chapter respectively.
5 Business requirements
Although it will be an integral part of the ACAP project to further develop and confirm the business requirements of publishers for the operation of the framework, significant progress has already been made in identifying the high level business requirements against which any technical solution must be measured. In summary, the solution must be:
• enabling not obstructive: facilitating normal business relationships, not interfering with them, while providing content owners with proper control over their content • flexible and extensible: the technical approach should not impose limitations on individual business relationships which might be agreed between content owners and search engine operators; and it should be compatible with different search technologies, so that it does not become rapidly obsolete. • able to manage permissions associated with arbitrary levels of granularity of content: from a single digital object to a complete website, to many websites managed by the same content owner • universally applicable: the technical approach should initially be suitable for implementation by all text-based content industries, and so far as possible should be extensible to (or at the very least interoperable with) solutions adopted in other media • able to manage both generic and specific: able to express default terms which a content owner might choose to apply to any search engine operator and equally able to express the terms of a specific licence between an individual search engine operator and an individual content owners • as fully automated as possible: requiring human intervention only where this essential to make decisions which cannot be made by machines • efficient: inexpensive to implement, by enabling seamless integration with electronic production processes and simple maintenance tools • open standards based: A pro-competitive development open to all, with the lowest possible barriers to entry for both content owners and search engine operators • based on existing technologies and existing infrastructure: wherever suitable solutions exist, we should adopt and (where necessary) extend them – not reinvent the wheel
The approach taken should also be capable of staged implementation – it should be possible for initial applications to be relatively simple, while providing the basis for seamless extension into more sophisticated permissions management.
Although the scope of the project is initially limited to the relationship between publishers and search engine operators, a framework which meets these requirements should be readily extensible to other business relationships (although details of implementation would not be the same in every case).
6 The Pilot Project
The ACAP pilot project is expected to last for around 12 months. In outline, it anticipated that the project will: • confirm and prioritise the business and technical requirements with the widest possible constituency: agreement with all stakeholders is essential if the project is to succeed in the long term • agree which specific Use Cases should be implemented in the pilot phase of the project, starting with a relatively simple approach • develop the elements of the technical solution: it is anticipated that this will primarily involve the development of standards for policy expression, although it will also be necessary to develop the tools for the implementation of those standards • identify a suitable group of organisations willing and able to participate in the pilot project; it is currently anticipated that this could involve four or five publishers and one of the major search engines; participants will need to be in a position to dedicate technical and time resources to the project to enable it to succeed • pilot the standards and the tools, to prove the underlying concepts In parallel with the development of the technical solution, a significant stream of project work will involve the development of a sustainable governance structure to manage and extend the standards (and any related technical services) which will be needed after the project phase of ACAP is complete. To avoid duplication of effort, ACAP will also establish liaisons with relevant standards developments elsewhere. In particular, the project is already in contact with EDItEUR with respect to its development of ONIX for Licensing Terms; and, in view of the significance of identification issues, with the International DOI Foundation.
7 Next steps
It is anticipated that the project will be launched publicly in September 2006; there is a great deal to be achieved between now and then, and at launch it will be possible to be much more explicit about plans and expectations. However, it is very important that the publishing community as a whole is ready and willing to respond positively when the project is launched.
The feasibility study commissioned by WAN, EPC and ENPA concluded that this project is technically feasible – and indeed requires little in the way of genuinely new technology. Rather, it requires the integration and implementation of identification and metadata technologies that are already well understood. It is also possible to chart a developmental path which does not demand that every element of the framework must be in place before any of it can be usefully implemented.
However, this is not to suggest that everything will be simple, not that it can be achieved without cost. A significant part of the project cost will have to be borne by those organisations that agree to participate in the pilot, in the development of their own systems; however, there will also be central costs, to which it is hoped that other publishers will be prepared to contribute.
If you have any questions about this project, or would simply like to express your support, please contact: info@the-acap.org
Posted by Danny Sullivan at 10:41 AM | Permalink
I've had a long talk with the group that so far has successfully sued Google in Belgium over indexing, a talk that leaves me thinking they don't fully understand how search engines work and why their arguments over copyright infringement will ultimately fail. Then again, the case is really about trying to convince Google it should pay to carry their news content. A closer look at all this in the story below, as well as an update on the situation in general, including an appeal for Google that's been granted.
Let's go back to the beginning. In March, Copiepresse tells me it started legal proceedings against Google over its inclusion of Belgian news sources without explicit permission. The organization represents a number of publishers that were concerned over being indexed.
Information about the case, including a summons, was all set to Google in the United States, according to Copiepresse. A hearing was held in Belgium on September 5th, then the ruling came out last Friday, September 15. Google didn't take part in the hearings, for reasons it says it is still investigating.
The ruling required that Google do two main things within 10 days of receipt:
Over this past weekend, Google says it complied with the first part. It removed links to at least these news sources, Google told me:
dhnet.be grenzecho.be lacapitale.be lalibre.be lameuse.be lanouvellegazette.be laprovince.be lecho.be lequotidiendenamur.be lesoir.be pressbanking.com votrejournal.be
It's been noted that Google did more than remove these sites from Google News Belgium. They were removed from Google Belgium entirely. Here are a couple of searches that demonstrate this:
site:dhnet.be site:grenzecho.be site:lacapitale.be site:lalibre.be site:lameuse.be site:lanouvellegazette.be site:laprovince.be site:lecho.be site:lequotidiendenamur.be site:lesoir.be site:pressbanking.com site:votrejournal.be
Some have thought this is an example of Google getting revenge, robbing these publishers of regular traffic they probably assumed was safe in a fight over Google News indexing. For its part, Google said its reading of the ruling meant that the sites had to be dropped entirely from Google Belgium:
Order the defendant to withdraw the articles, photographs and graphic representations of Belgian publishers of the French - and German-speaking daily press, represented by the plaintiff, from all their sites (Google News and "cache" Google or any other name within 10 days of the notification of the intervening order, under penalty of a daily fine of 1,000,000.- € per day of delay;
I've bolded the key part. Google says it interpreted "all their sites" as being all sites that it views the court having jurisdiction over, anything using the Google.be domain. In addition, Google has removed the sites from Google News worldwide, saying it is treating the ruling as it would any request to be removed from Google News. In those cases, you're dropped entirely, not on a country-by-country basis.
The sites do still appear in a searches via Google.com or other Google editions not aimed at Belgium. While these sites can still be reached from Belgium, Google considers them outside Belgian jurisdiction.
That view is sort of laughable, though I understand the reasoning well. It's unlikely that Google Belgium is actually being served up out of Belgium, so artificially pretending that Google.com another other Google sites are somehow "outside" Belgian jurisdiction makes no sense. However, this type of pretending isn't that unusual. It's a nice way for search engines to act like they are following the ruling of a particular country by making changes on "that country's Google." It's also a convenient way for particular courts to feel they've exerted jurisdiction over sites that that they might really not be able to control.
Overall, Google has complied with the first part of the ruling. As for the second, it hasn't posted the required notices and says it will wait for a ruling due out Friday specifically about that issue. It argued yesterday in a hearing for appeal that posting the notice on the home pages wasn't necessary given all the publicity the case has now received.
An appeal for the case overall was granted. It will be heard on November 24, and the entire matter is largely in limbo until then. I hesitate to consider the case a victory for Copiepresse given that the first hearing -- for whatever reason -- had no defense from Google at all.
This leads me to Copiepresse's complaint with Google. In the group's view, Google has illegally copied material without permission. It feels that in some way, Google should get permission before indexing.
Indexing, of course, is not copying. Search engines do read pages in to make them searchable, as my Indexing Versus Caching & How Google Print Doesn't Reprint article explains in more detail. But indexing isn't reprinting pages, in the way some arguments try to make it. Google does show cached copies, something raised in the case. But cached copies aren't shown within Google News search, which was the main focus of this case (as an aside, one US court has ruled cached copies aren't an infringement, something I disagree with but something also easily rectified through no caching mechanisms).
I had a very long conversation about the permissions issue with Margaret Boribon, secretary general of Copiepresse, to try and better understand how they wanted Google to operate. Why not use commonly understood and effective mechanisms such as robots.txt files or meta robots tags to prevent indexing?
"If you do so, you admit that Google does what they want, and if you don't agree, you have to contact them. This is not the legal framework of copyright," Boribon said.
This is an age old issue in the search engine world. By default, search engines assume that permission is granted to index a document, in order to make it searchable. Technically, shouldn't they get explicit permission? Legally, that might make things safer. Logistically, it would never work. Many sites don't have clear contact details. Some domains themselves contain multiple sites. Moreover, there are millions of sites across the web. Contacting them all beforehand simply wouldn't work well.
I asked Boribon about this, how her group would propose search engines undertake such a task.
"I'm sure they can find a very easy system to send an email or a document to alert the site and ask for permission or maybe a system of opt-in or opt-out," she said.
Would it be OK for such a system to work automatically, I asked? Yes, that would be fine. A machine-to-machine connection would be OK, she said. So then, I asked, why not use the existing robots.txt or meta robots systems?
Both mechanisms are easy, automatic ways for publishers to declare if they grant indexing permission or not. In fact, I'd argue that both are a way for search engines to ask beforehand for the very permission that Copiepresse wants them to seek. Major search engines -- not just Google -- all request or check these blocking mechanisms.
Boribon rejected the existing solutions. One issue she had was that they weren't legally endorsed. That's true, but that's also something I think will change over time. In the US, we've had one case recently where opt-out solutions like tags have been accepted.
Outside the US, there have been some scatted cases, such as this one from 1997 in the UK involving news indexing. But none of these cases have seemed to stop the search engines.
The Belgium case could be different. What happens in one country isn't applicable to others. It may be that Copiepresse will prove its point that permission should be sought in advance. Alternatively, a court could endorse existing blocking mechanisms as having legal force.
That's what I think should happen. These systems pose an easy way for anyone who doesn't want to be in a search engine to stay out. If the issue with Copiepresse was really about not being indexed, all of the publications it represents could easily stay out through those solutions. Google -- like other major search engines -- doesn't index sites against their wills.
There's more at work here, of course. The publications DO want to be in Google. The action is simply an effort to force Google to the bargaining table and get paid for inclusion, from what I can see.
"Our purpose is not to be excluded. Of course, we want to be in the system, but on a legal basis," said Boribon. "We want to be remunerated."
Her group's view -- as is the view of the World Association Of Newspapers that she also referenced several times -- is that Google is exploiting sites. It is making money off these sites and giving them little or nothing in return.
Most search marketers hearing this have to stifle laughter or disbelief. That's because most search marketers want all the search traffic they can get. It's free, easy and converts well. They understand that search engines give them plenty of value and complain most when something happens to take that traffic away, as was the case with the Google Florida Update of 2003.
I'm not going to spin out the argument that search engines generate far more benefits from the indexing they do than harm. For one thing, I think this is self-evident given the sheer amount of concern of getting into search engines, rather than out of them. If you must have more argument, see my past post, Search Engines As Leeches, The Difference Between Paid & Free Listings & Keyword Price Rises.
The difference between most publishers on the web and those of Boribon -- or book publishers also suing over Google's scanning program -- is that they think they are special, in my opinion. They think they have content that is more important than other content on the web, content that is either entitled to more protection or that warrants payment for being included.
Several times, Boribon stressed that those who spent a lot of time and money on their works deserved to be compensated by Google. My response was that I don't care if content is worth €1 or €1,000,000. It is entitled to the same protections. To be fair, Boribon agreed when I made that point. Yet our talk still continued to be riddled with her references to the high value of some content or the concept that only some content had protected status.
I've been through this before. Why Don't Book Publishers Object To Web Indexing? covers how one book group, while admitting that copyright law should apply the same regardless of whether works are in digital or book form, still suggested that online works were somehow different:
I think the issue is much more acute where the content is not made freely available by its copyright owner - which is, of course, the case for all the in-copyright content Google are planning to digitise from libraries.
Skipping past copyright law, let's focus on payment for inclusion. Boribon said that Google had made special arrangements with Le Monde to include it in Google News, explaining that was one of many examples of Google targeting the most important sources for special treatment.
My response was Google has special arrangements with lots of publishers that have content that can't easily be indexed. If Le Monde required user registrations, Google couldn't spider the site without contacting them and being allowed in. Indeed, it's the same thing Google has done for the New York Times, as we've covered. It's something Google (and other search engines) does for even non-news sites, if they have important content that it thinks should be gathered.
Google is not paying Le Monde or the New York Times for these arrangement, however -- something that Boribon seemed to believe the case, and no doubt other publications do as well. Google confirmed with me it has no payment system like this with Le Monde. But such a belief highlights the huge education challenge Google faces, trying to help these publications that have mistaken notions of how it -- and all search engines -- operate.
Of course, Google does have one paid relationship with a news source that came to attention recently, the Associated Press. Google still hasn't explained exactly whether this was a relationship it did to prevent an AP lawsuit over being in Google News or a separate agreement to pick up some of AP's content for reuse.
Fair to say, AP's content is important enough and helpful enough to Google that it did decide to enter into an agreement to make use of it in some way. Boribon's group feels their content is important enough that it should obtain some type of agreement as well.
This is also an old story, in some ways. Tom Mohr in Editor & Publisher earlier this month was only the latest of those with the newspaper industry sounding a call for newspapers to band together to deny content in hopes of getting paid:
But what if 2/3 or more of the U.S. newspaper industry sits on one platform, managed by Switzerland Inc.? What if Switzerland Inc. decides to deny Yahoo! and perhaps Google access to newspaper industry content for three months, followed by a negotiation for better terms? That's the power of a network.
The World Association Of Newspapers had a similar call earlier this year:
Web search engines, such as Google and Yahoo, collect headlines and photos for their users without compensating the publishers a cent, according to the World Association of Newspapers (WAN), which announced Tuesday that it intends to "challenge the exploitation of content" by the Googles and MSNs of the Web.
The Belgian lawsuit is simply another step forward in pushing for that payment, exactly what Google CEO Eric Schmidt described as "negotiation being done in a courtroom" when I spoke with him last month:
Because of our scale and because of the amounts of money that we have, Google has to be more careful with respect to launching products that may violate other people's notion of their rights. But also, frankly, we find ourselves in litigation and the litigation was expensive, and diverts the management team, etcetera, from our mission. In the cases that you describe, most of the litigation in my judgment was really a business negotiation being done in a courtroom. And I hate to say that, but that is my personal opinion. And in most cases a change in our policy or a financial change would in fact address many of the issues.
In the end, I want honesty. If the Copiepresse or the AFP (also suing Google) feel Google doesn't have permission to index their content, then just use the easily implemented mechanisms to get out and stay out. Don't file unnecessary court cases, nor just single out Google as the whipping boy when Yahoo and Microsoft, to name only two search engines, operate the same way.
Is it about getting paid? Is it that these publishers think they are so important they should get money for being included, since links alone to their web sites make search engines more comprehensive. That's fine, but you don't need a court case for that either. Just opt-out. If you're worth it, Google and the others will come running to the negotiating table. If you're not, well, no one's going to miss you -- but you'll miss the search engine traffic, as the Belgian publications almost certainly are discovering to their horror now.
I don't want lawsuits that seriously threaten web search itself. Bourbon's ruling potentially applies to all content, not just news content, in Belgium. Anyone could sue Google and other search engines saying that robots.txt blocking isn't explicit enough. If that happens, Boribon's organization is going to find searching the web from Belgium is difficult, since there won't be any content in Google, Yahoo or other services at all.
That would be ironic, given that Boribon says she's a regular Google user. She's routinely using a service where virtually none of the content listed is there because of some explicit approval process. That's hypocritical, given her group's lawsuit. If they don't believe opt-out mechanisms are sufficient, then none of these member publications should be using Google or any search engine as part of their daily routines.
Postscript: V7N points at WAN to combat 'search engine spiders', which has the World Association Of Newspapers suggesting incorrectly that search engines have no technological solution to spider only some content. They absolutely do. Content can be flagged on a page-by-page basis, if that's what a content owner wants to do.
Posted by Danny Sullivan at 3:23 PM | Permalink
Reuters reports the big news of the day that Google has been ordered by a Belgian court to remove all articles, photographs and graphics from French-speaking newspapers. Copiepresse issued the complaint and won the court ruling on September 5th. Not only does this require Google to remove content from Google News, the court order requires removing the content from the Google cache. ChillingEffects.org has a link to the full court order.
Posted by Barry Schwartz at 8:23 AM | Permalink
News.com reported that the suit issued back in July 2005 over the Wayback Machine and the Internet Archive has now been settled. Of course, the terms of the settlement have not been released to the public. In short, there was a page archived by the Wayback Machine that had sensitive information on it, the Wayback Machine technically should not of indexed it but a "temporary bug" had it indexed for a bit. Brewster Kahle of the Internet Archive said, "this is really a lawsuit between two parties and we got sort of dragged into it and I'm glad we're now out of it." Read the long suit we reported back in July '05 here.
Posted by Barry Schwartz at 10:36 AM | Permalink
As it happens, I was at Google yesterday when the story came out about the financial agreement between Google and the Associated Press over the use of AP content. That story raised a number of questions, and here are some answers I can share so far from Google.
First, this is not a pay per click deal. Yesterday's Mercury News article talks about some agreements in general being this way:
It's a common perception, but it's false. Google and Yahoo, along with dozens of other Internet companies, have been quietly agreeing to deals that compensate some of the country's top news organizations for their content and help drive more traffic to their Web sites.
Recently completed deals, which include arrangements in which media organizations such as the Associated Press will be compensated on a pay-per-click basis, could herald a major shift in the relationship between the old media and new Internet gatekeepers.
The article doesn't say that the Google deal specifically is pay per click, but some people might wonder if that's the case. Google now clarifies that it is not.
Is this an agreement to keep Google from being sued by the AP, as it is by the AFP? Google wouldn't answer directly but said:
Google News is fully consistent with fair use and always has been.
Note that paidContent has reported how the AP only a few months ago said:
Let me say more clearly: we're not suing them.
So I tend to think it's safe to say this wasn't being driven out of legal fears.
What's the agreement cover? No more real details than you've already read before:
The license in this agreement provides for new uses of original AP content for features and products we will introduce in the future. We are very excited about the innovative new products we will build with full access to this content.
But note that this specifically talks about new uses -- not current uses. IE, I read this as Google saying again that what it has been doing to index AP content is not something it feels it needed an agreement to do.
Also this tidbit:
This is not the first time we've had a financial arrangement with a news organization.
Coincidentally, I'm at news search site Topix today, literally borrowing a conference room to do some email and blogging catch-up. I had a catch-up meeting with them earlier, and the issue of deals with the AP and newspapers in general came up.
Topix noted they signed an agreement with the AP earlier this year, which is part of an overall trend where they've seen news organizations eager to come up with new ways to work with news search sites.
Was this prompted by a legal fear? No. It was part of figuring out a way of dealing with syndicated news content that helps treat the AP's member publications fairly online.
AP stories can originate from one of thousands of member publications. Any of those thousands of member publications might also republish an AP story. Which story is the originating one? That's useful for a search engine to know, if you don't want your results to get overwhelmed by having duplicates of all the same content.
In terms of fairness, Topix uses the agreement to get a rich data feed of content from the AP (along with many other things). This helps them better understand if an AP story originated from a particular member publication and, if so, to link over to the publication that deserves the credit.
The agreement also allows Topix to put AP-originated national and international stories on its own site, rather than having to guess at which of many different news sites to point at.
For example, if the AP runs some international story that an AP reporter has written, how should Topix decide which newspaper to point at? Just pick some random newspaper that had nothing to do with creating it? And if so, what about registration or payment issues that might be in place at that random paper.
Hosting AP national and international stories helps solve this problem. Of course, hosting AP stories that come from the AP directly also means Topix -- and indirectly the AP -- can earn from ad revenue.
Understanding what Topix does with the AP shed sheds some light on possible Google motivations in working with the AP. Perhaps we'll see hosted stories as Topix is doing -- and as Yahoo also does -- for some of the reasons explained above. And perhaps the deal also is to give Google better news search capabilities as I've also outlined, something that's hard to do without a deeper relationship.
Postscript: Google, AP Disclose News Payment Deal from, ironically, the Associated Press suggests that a legal dispute was behind the deal. From the lead:
Google Inc. is paying The Associated Press for stories and photographs, settling a dispute with a major provider of the copyright news that the online search engine finds and displays on its popular Web site.
But further into the story, I don't see anything explicitly supporting that statement. There's this:
While AFP sued to protect its rights, the AP chose to negotiate terms with Google, which, after just seven years of existence, is nearly 10 times larger than the 160-year-old news cooperative in terms of revenue. The AP, a not-for-profit organization owned by U.S. news companies, had revenues of $654 million in 2005. Google, a publicly owned company, reported $6.1 billion in revenue last year and is on a pace to exceed $9 billion this year.
By agreeing to pay AP for content, Google falls in line with the owners of other popular news sites like Yahoo Inc., Microsoft Corp. and Time Warner Inc.'s AOL, which have been anteing up for years.
"We are happy to be dealing with Google as we are with all the major superpowers on the Internet," Seagrave [Jane Seagrave, the AP's vice president of new media markets] .said. "We are always looking for new ways to innovate."
But there's no one from the AP explicitly attributed in the story as saying that the AP was going to sue unless this agreement was reached. Still, I know the story author Michael Liedtke well, and I can't see him saying there was a dispute unless someone was saying that was what this about. I assume that would have been Jane Seagrave.
Posted by Danny Sullivan at 8:46 PM | Permalink
A US federal judge has declined to dismiss a copyright infringement case filed by Agence France Press against Google News. Instead, she's given both sides more time to assemble evidence before ruling on a dismissal motion.
Judge: Google News lawsuit can proceed from News.com has the rundown. Part of the problem seems to be that the neither AFP nor Google can easily reconstruct Google News pages from 2003 and 2004.
You'd think that if AFP was going to file a copyright infringement case, they'd have recorded exactly this type of evidence to present. But having been involved in a few cases as an expert witness, it doesn't surprise me that the plaintiff has made accusations without saving the key evidence. Note to those planning to sue over some search-related case -- save HTML pages and screenshots!
AFP says that Google News unlawfully incorporated headlines, photographs and story summaries. Google argues they haven't and that AFP has yet to identify specific infringements over the past year since the case was filed.
AFP Content Still In Google News, Probably Via AFP's Own Partners from me earlier this week covers how some AFP content still shows in Google via AFP partners, despite Google saying last year it would drop AFP.
Continued Google News indexing 'boosts AFP case' from IDG is a follow-up with the AFP lawyers saying this may hurt Google further. We'll see. It could well hurt AFP, if Google argues that AFP itself has failed to instruct its partners on how to keep AFP content out of search engines.
Over at ResourceShelf, scroll down in this post, and Gary Price has listed a variety of legal documents in the case. This is a key one that indicates the absurdity of the AFP claim. They've filed a copyright infringement claim, but the document shows how they are depending on Google to go back and give them the evidence to back the claim, via archived Google News papers and photographs. In contrast, this is exactly the type of evidence you'd assume AFP already had assembled prior to filing a claim.
Posted by Danny Sullivan at 7:52 AM | Permalink
"Despite suit, Google News still indexing AFP content" from IDG News Service covers Agence France Press content still appearing in Google News after the company said last year that it would no longer carry AFP content, following a copyright infringement lawsuit. The problem seems to be that AFP content is distributed by other publishers, such as the New York Times.
There's no foolproof way for Google to flag these articles as AFP content and thus remove them. Honestly, it's down to AFP itself to teach its distributors to learn out to use the meta robots tag to flag this content as not to be indexed.
Then again, I'm sure that over time, the situation will resolve itself. After all, if AFP is stupid enough not to understand the value of search traffic, smarter publications that do understand this like the New York Times itself will overtake it as people turn to them for content online.
Posted by Danny Sullivan at 5:18 AM | Permalink
Out-Law reports that Google was ordered by Justice Rimer to hand over the information on an advertiser to Helen Grant for copyright infringement. Helen Grant "complained that a Google advert led to a service which she claimed violated her copyright in a forthcoming book." A search brought up a site named Realityunlocked.com, "which offered a free download of an earlier draft of the book, and that the site violated the Trust's copyright." Google asked Grant to take the issue to court, this way Google does not have to worry about the privacy issues with handing over the information.
Posted by Barry Schwartz at 8:33 AM | Permalink
Spotted via TechCrunch, Bloomberg reports that Yahoo China is to be sued for linking to sites that sell pirated music. The article claims "about 90 percent of all recordings in China are illegal, with sales of pirated music worth about $400 million annually," according to the International Federation for the Phonographic Industry. A new law in China that came into effect on July 1 "fines distributors of illegally copied music, movies and other material over the Internet as much as 100,000 yuan ($12,500)."
Posted by Barry Schwartz at 10:47 AM | Permalink
I wrote earlier this month of a French lawsuit becoming the third one I knew about filed against Google over its book scanning project. Turns out, there was a fourth one -- based out of Germany. But now we're back to three, as Google has just announced that the German one has been withdrawn.
It looks to be Google's first legal victory in the battles over the project. From Google, via its Inside Google Book Search blog (and also on its main blog):
WBG, a German publisher, today decided to drop its petition for preliminary injunction against the Google Books Library Project. WBG (whose legal action was supported by the German Publishers Association as an industry model) made the decision after being told by the Copyright Chamber of the Regional Court of Hamburg that its petition was unlikely to succeed. It's our belief that the display of short snippets from in-copyright books does not infringe German copyright law. Today the Court indicated that it agreed, drawing a comparison with the snippets used in Google web search. And the Court also rejected the WBG's argument that the scanning of its books in the U.S. infringed German copyright law.
Posted by Danny Sullivan at 3:29 PM | Permalink
Chris Boggs over at the Search Engine Roundtable wrote an item named Which Came First: the Content or the Plagiarism? which discusses the challenge search engines face when it comes to determining the original source of a particular piece of content.
For example, the content I am writing right now may be picked up within a matter of seconds by another site that wants to "borrow" or steal the content. So now we have two (probably a lot more than two) sources with identical content. A search engine can say, hey, I found source A before I found source B with this particular content, so source A must be the original source. But if you think about that, since spiders don't work in real time, a search engine may visit the source that "borrowed" the content prior to visiting the original source of that content.
Chris offers two suggestions. The first is to watch your crawl cycles in Google and wait just before to post the content. Now that is not really feasible, as Chris knows, because there is no way to exactly know when Google will crawl your site and news information must be posted as soon as possible, so waiting is normally not an option. Chris uses this example to make a point, I believe. The other option Chris suggests is to use Google Sitemaps, so Google can see you as a trusted source and be feed the information, sooner than later.
But what do you think is the algorithmic solution? I personally do not know. There are people discussing the fundamental challenge at Search Engine Roundtable Forums.
Posted by Barry Schwartz at 9:29 AM | Permalink
Postscript: Google argues with U.K. publishers over digital libraries from News.com covers publishers in the UK making new attacks against the program. The Publishers Association trots out the usual argument that scanning to index is the same as copying to reprint and that permission should be required.
The group's web site, it should be noted, has 919 pages listed in Google, all of which are protected by copyright, all of which Google and other search engines index without explicit permission -- and all of which the group apparently doesn't object to, since it doesn't seem to have banned indexing using a robots.txt file (the site is down, so I can't verify this first hand -- but the pages really are unlikely to be listed if this were the case). But do the same thing with a print book -- copy for indexing purposes rather than reprinting -- and suddenly, that's infringement. Well, the courts will sort it out.
Indexing Versus Caching & How Google Print Doesn't Reprint and Once Again -- The Difference Between Google Print & Google Library are two key articles from me that examine the issues above in much more depth.
Posted by Danny Sullivan at 10:45 AM | Permalink
I left newspaper reporting about ten years ago because it was clear the industry had no idea how to transition to an online world, and I didn't want to be stuck behind. Today's Chicago Tribune article, Papers, Web sites in scrape on stories, just tells me things don't appear to have improved much. Search engines, including Google, get a fresh dose of being leeches for using content. Except publishers, they don't reprint your content. They reprint summaries and link to your articles. And if you'd get a clue, you'd understand that brings you traffic, which should make you money.
Don't like it? Then slap up a robots.txt file to ban the news search engines and leave the traffic for the rest of us. The story's not all bad news. Some publishers are waking up to search and figuring out how to deal with it. For another example of the search engines as menace to newspapers concept, see World Association Of Newspapers Dislikes Search Engine Exploitation, Clueless About Robots.txt Banning from February.
Posted by Danny Sullivan at 7:56 AM | Permalink
The law is still in its formative stages when it comes to the web, search engines and other online technologies. What do you do when someone is uses your trademark in a paid listing, or is scraping your content without permission and making money with contextual ads? Or what if a competitor alters your press release to point to their own site? These and other questions were debated by a panel of legal experts at a recent Search Engine Strategies panel, covered in today's SearchDay article, Trademark Protection, Copyright and Search Engines.
Posted by Chris Sherman at 9:07 AM | Permalink
News.com reports that Google has won the copyright infringement case issued against them by writer, Gordon Roy Parker. Parker posted a chapter of his book on Usenet bulletin board, which was then indexed by Google. Parker sued because Google archived the book and provided "excerpts from his Web site in search results." The judge ruled in Google's favor, stating, "When an ISP automatically and temporarily stores data without human intervention so that the system can operate and transmit data to its users, the necessary element of volition (willful intent to infringe) is missing." The full court documents can be downloaded in PDF format here.
Danny has once wrote about Parker's case back in October 2005, he named the entry Indexing Versus Caching & How Google Print Doesn't Reprint. Danny linked to Ray's page named Why Am I Suing Google For $10 Billion?
Posted by Barry Schwartz at 9:15 AM | Permalink
News.com reports that Nude photo site wins injunction against Google. This could have a rippling affect on image search as we know it today.
Perfect 10, an adult photo site, has proven to the court that Google's image search thumbnail copies are a violation of U.S. copyright law. This past Friday, U.S. District Judge A. Howard Matz, ruled that Google has violated the law "by creating and displaying thumbnail copies of its photographs."
Google plans to appeal the case, but the main reason documented as to why Google lost the case is two fold. First, Google monetizes image search with AdSense of those sites that have pirated the images of Perfect 10. Second, Google has image mobile search, which enables users to save a downsized version of the image to the phone, that image is "similar to what Perfect 10 offers as a subscription service through U.K.-based Fonestarz" and could hurt Perfect 10's revenue and earnings.
Gary Price has posted the full text of Google/Perfect 10 decision, 48 pages, at ResourceShelf as a PDF.
Want to comment or discuss? Visit our SEW Forums thread, Nude Photos Get Google in Trouble Over Copyright Law.
Posted by Barry Schwartz at 9:27 AM | Permalink
On what is a growing number of occasions we've beeb highlighting the good and very useful work that Phil Schnyder and his team at askSam are doing by are providing free searchable and browsable (online or download and use offline) to classic books, government and legal documents, speeches, and more utilizing askSam database software.
Today, we've learned that ask Sam has just released three new databases (what they call eBooks) that might be of interest to some of you, especially those with an interest in copyright issues.
First, U.S. Copyright Law (title 17 of the US Code) "Search and analyze the full text of the Copyright Law of the United States of America & related laws contained in Title 17 of the United States Code. Copyright is a form of protection provided by the laws of the United States (title 17, U.S. Code) to the authors of 'original works of authorship,' including literary, dramatic, musical, artistic, and certain other intellectual works."
Second, The Digital Millennium Copyright Act (DMCA) "Search and analyze the full text of The Digital Millennium Copyright Act (DMCA). Passed in 1998, the DMCA is a bill designed to bring the Copyright Law up to date with digital media."
Third, State of the Union Addresses of the American Presidents "Search and analyze the full-text of all State of the Union Addresses from 1790-2005."
A complete and rapidly expanding list of ask Sam book, all free, can be found here.
Btw, to view offline you'll need a free copy of the askSam reader.
Posted by Gary Price at 12:13 PM | Permalink
Google & Search Engine Cached Pages Legal, If They Offer Opt-OutVia Boing Boing and News.com, interesting news that a case saying the Google cache violates copyright has been ruled in Google's favor. Since Google makes it possible to prevent it from showing a cached page, the court ruled the publisher should have used that. In short, it you don't block caching, Google and other search engines have an "implied license" to reproduce your material. More in the court documents on the EFF site here (PDF format), and an EFF write up is here.
Postscript: Caching Made Legal - Do You Agree? I Don't! at the Search Engine Watch Forums has more analysis of this by me and some of the major concerns it raises. Read more or comment yourself over there. There's also excellent discussion at WebmasterWorld here.
Posted by Danny Sullivan at 7:57 AM | Permalink
Om Malik has an interesting post about something we've mentioned here on the blog several times and that Om and I have chatted about via email, that being the amount of material found in video search engines that is in-copyright but readily accessible to view or download for free.
Some might call it video piracy.
What this means for the future of video search in general is an intersting issue that I'm sure we're going to be reading much more about in the future. Why? That's easy, money, and lots of it. Since more and more content is also for sale online via one of many services like iTunes or Google Video Store. If a copy of a movie or TV show is available for free will people still pay to download/rent/purchase the content? New services from TiVo and DirectTV will make the potential for sharing content even easier.
Om's post includes statements from Google and YouTube on the topic. From what I've learned (and these official comments reinforce) is that the burden to have in-copyright content removed from a video search engine is that of the copyright holder.
I'm thinking that tools and services to monitor and then have the proper requests sent to video search engines could be a big business not only here in the U.S. but worldwide.
Finally, Malik points to this just posted story that talks about the amount of Bollywood films available for free via one of many services.
Posted by Gary Price at 12:22 PM | Permalink
On December 28th the Congressional Research Service (CRS) released a six page research report that looks at online indexing, law, fair use, and the Google Book Search project. The full text of the report is available here.
The Congressional Research Service (CRS) is one of the most respected names in research. You name the topic and they prepare reseach reports on it. CRS is located here in DC at The Library of Congress. Many of their reports are difficult to access (that's another story) but thanks to various organizations like the Open CRS Project, the Federation of American Scientists, and National Library for the Enviroment, the IP Mall collection, it's getting a bit easier. One caveat is that CRS reports are frequently updated so make sure you're reading the most current version possible. Thanks to S.B. for the news tip.
Posted by Gary Price at 2:27 PM | Permalink
The Information World Review (IWR is a VNU publication) article: Google digitisation faces Euro legal challenge, reports on Google's book digitisation project (the Google Library Project to be precise) facing some legal obstacles in Europe.
Here it is in a nutshell, direct from the article: Google has acknowledged that it cannot digitise copyright material from European libraries, according to the Association of Learned and Professional Society Publishers (ALPSP).
The article goes on to say that in meeting last month Google agreed that: ...it was "absolutely the case that it is not allowed to [digitise in-copyright material from libraries] in Europe.
At the moment, The Bodleian Library at Oxford University is the only one of the "Google Five" libraries located in Europe. This post has more about the holdings of all five libraries including in-copyright and public domain holdings.
ALPSP chief executive Sally Morris said that she is planning to create a system that will make it easy for Google, the Open Content Alliance, or any other organization wanting to digitise material.
She told the Bookseller: "The fact Google recognise they can't do this without permission in Europe gives us a threshold to work out a way for them to get permission. In America, they have the law on their side. Here, they accept they don't."Her suggestions, put to Google at the meeting, include a Canadian model whereby, if it proves impossible to locate a copyright owner, a licence is granted so the material can be used legally.
Morris also told IWR that she is waiting to here back from Google on these issues. She said that Google was interested.
Btw, Danny chatted with the ALPSP's Sally Morris in this blog post.
Posted by Gary Price at 8:22 PM | Permalink
Kazaa Ordered By Australian Court To Block SearchesVia Russell Beattie, Search terms on Kazaa to be blocked from ZDNet Australia has the rather disturbing order of an Australian court telling Kazaa that it must not allow searches on certain terms, such as artist names.
Let's be crystal clear about the order. It is not about removing pirated music (which is difficult because Kazaa doesn't host files). It is preventing people from searching for these words at all. Record companies will provide the list.
The reason that is is disturbing is the thought of such an action being applied to other search services. What if some company convinced a court that web search engines like Google and Yahoo were providing links to copyrighted material? Could that company then prevent people from searching on things like "madonna?"
I suspect a key issue in all of this is the fact that it could be argued that those searching for artists by name on Kazaa could be shown as being very likely looking for pirated music. But the idea of censoring what people are allowed to search for at all just feels like a step too far.
Techdirt has some additional coverage here and points out the same thing happened to Napster back in 2001. PC World also has coverage here.
Posted by Danny Sullivan at 1:00 PM | Permalink
CustomizeGoogle (CG), the popular and award winning Firefox extension that offers numerous options to customize the search engine (including one option to remove ads from most Google results pages) now has its own blog that's located here.
The most recent blog post mentions that an update to CustomizeGoogle now, "makes it easy to removes[sic] image copying restrictions in Google Book Search (aka Google Print)." The CG home page puts it this way:
Removes image copying restrictions in Google Book SearchThis is accomplished by first heading to the "Book" tab in CG and selecting, "Restore Right-Click Context Menu" and then placing your cursor on a page from a book.
I don't believe this feature allows you to print Google Book Search content by just clicking and selecting print. When I tested, I didn't see pages from a book but only the material surrounding the actual page.
However, using the right-click menu (now easily enabled for CustomizedGoogle) and placing a cursor on a page from a book, I was able to quickly isolate the page (as a JPEG file) and then print, save, convert, etc. I was also able to isolate direct urls to book pages and send them via email. You can even save book pages as wallpaper on your PC.
As we've pointed out on this blog before as well as others mentioning it in their writing, content in Google Book Search, particulary new in-copyright content, is not supposed to be printable (short of screen caps).
Of course, limits about how many in-copyright "Sample Pages" you can view are still in place and the "copyrighted material" text is still visible on each page. Google Book Search does offer the full view of public domain materials.
It will be interesting to see how (if at all) Google and participating publishers respond to this new option since it's coming from such a highly lauded software program.
Postscript from the CustomizeGoogle Developer (via Email): Before, Google disabled right click. With CustomizeGoogle, right click is enabled again, and now you can right click choose save image/view image/etc.
Google also have some restrictions on how many book pages you can view in a single session or per day. If you anonymize your Google ID, you should be able to view more book pages. However, I haven't tested it so I don't really know if it does work.
Posted by Gary Price at 5:54 PM | Permalink
The NY Times article: Googling Literature: The Debate Goes Public, reports on the Google Print/Book Search debate that Chris blogged about last week.
My favorite passage from the article is the same one that JB used on Searchblog. He calls it the "choice quote." I agree.
Mr. [Allan] Adler [a vice president for legal and governmental affairs at the Association of American Publishers]said Google's contention that its search program might somehow increase sales of books was speculation at best."When people make inquiries using Google's search engine and they come up with references to books, they are just as likely to come to this fine institution to look up those references as they are to buy them," he said, referring to the Public Library.
To which Google's Mr. Drummond [Google's general counsel] replied, "Horrors."
Btw, an archived version of the debate (audio only) is available here. Video is coming soon. You'll need QuickTime to listen.
Also, kudos to the NY Times reporter for pointing out something that we've blogged about many times when it comes to material found in the Google Book Search database that HAS been scanned from a library and is still considered in-copyright:
Successful searches will return only three to five lines of text, which the company says constitutes a "fair use," allowed under copyright law.If online books are of interest, this post from last week about a service named ebrary might also be of interest.
Posted by Gary Price at 5:23 PM | Permalink
This should be interesting: The NYPL and WIRED Magazine present a discussion about the competing interests and issues raised by the Google Print Library Project, and whether a universal digital repository of our collective knowledge is in our future.
Speakers include:
Allan Adler, Association of American Publishers Chris Anderson, Wired Magazine David Drummond, Google Paul LeClerc & David Ferriero, The New York Public Library Lawrence Lessig, Stanford Law School Nick Taylor, The Authors Guild
The event is sold out, but you can watch a live webcast starting at 7PM eastern tonight via this link (Quicktime required).
Postscript from Gary: At the recent Internet Librarian Conference, Adam Smith, the senior business product manager for Google Print (or is that Google Book Search?) appeared on several panels. Summaries follow:
+ Dualing Keynotes...And a Third
Posted by Chris Sherman at 3:48 PM | Permalink
The Salon.com article: Throwing Google at the book, takes another look at the Google Print program and asks if it's time to change copyright laws.
Posted by Gary Price at 12:42 PM | Permalink
Upcoming Google Print DebatesCan't wait for the court fights over Google Print's library scanning program? Relax, because debates are underway to tide you over!
John Battelle points to a Wired/New York Public Library debate happening next week in New York City, involving Google, the Association of American Publishers that's suing Google, the Authors Guild, the New York Public Library and Lawrence Lessig. Only $15 and tickets are on sale now!
Not in New York? Then come on out to Chicago for SES Chicago 2005, where we have our own Google Print debate happening.
It's on Dec. 6 from 4:30pm - 6:00pm. Google's confirmed, and our diligent legal moderator Jeffrey Rohrs is lining up an anti-Google publishing group (the AAP can't make it due to a scheduling conflict, but they gave us plenty of other recommendations), plus a publisher that is pro and another that is con the scanning project.
OK, SES Chicago costs much more than the NY debate. But you get a lot of other sessions in addition to the debate, plus an absolutely fabulous box lunch and even free donuts if you catch me in the hotel lobby around 1am. Mmm, donuts -- it's a jelly. More about SES Chicago can be found here.
Just don't like to get out? OK, I wrote earlier about an online debate that was going on. Go check out the contributions, all for free.
Posted by Danny Sullivan at 7:01 AM | Permalink
After reading What's A Week On The Web Without Controversy? over at MediaPost, I'm literally shaking my head in disbelief at the confusion in the article and what it may breed among those who read it. So once again, I'm going to dive into what Google Print is, what it does and the difference between that and what I'm going to call Google Library. Perhaps some history will be helpful given all the debate in recent weeks.
Google Print was launched in December 2003 with the full cooperation of participating publishers, as our Google Introduces Book Searches article from that time explains more. You couldn't actually search on a Google Print site at that time, however. Instead, matches from Google Print would show up in regular search results, and you could click through to very limited excerpts. Interestingly, Random House was one of the participating publishers back then, whereas today, it's critical of Google Print because of the Google Library project I'll discuss below.
In October 2004, Google greatly expanded the way for publishers to participate in Google Print, as well as making it possible to see the full-text of books in varying amounts according to what PUBLISHERS chose to display, not what Google decided would be best. Our Google Print Opens Widely To Publishers article from that time explains more about this.
The MediaPost article I mentioned above talks about Google Print having a "library project" and a "publisher project," with the latter being most controversial:
Google is said to be working in two capacities: The "library" project and the "publisher" project. The publisher project is the most controversial, as Google aims to work with publishers to make copyrighted books searchable. The Authors Guild and five major publishers are suing to prevent Google from scanning books without explicit permission.
The opposite is true. It is the Google Library project that is controversial. The Google Print Program for Publishers project isn't part of Google Library. It's the preexisting program that allows publishers who wish (and plenty do) to make their content available through Google Print and viewable to the degree they want to show. There's nothing controversial about that program in terms of copyright issues, unless you find some authors who may have concerns that their publishers might benefit more than they do. Publishers who want to participate can and do. Publishers who don't want to participate stay out of the program.
Google Library is what I'll use as a shorthand description of Google Print Library Program, Google's library digitization project. It probably would help matters greatly if Google gave that program a name that is distinct from Google Print, as I'll explain further below.
Google Library launched in December 2004, with the goal of taking books (both in and out of copyright) in public libraries and scanning them to make them searchable. Our Google Partners with Oxford, Harvard & Others to Digitize Libraries article from that time explains more about the program.
One of the chief goals of Google Library was to feed new content into Google Print. But unlike with Google Print's publisher program, Google Library gathered content up without publisher permission.
It didn't take long for publishers to object to the activity. Copyright Questions On Google Digitization Project is a post from us in March 2005 about objections. Some Publishers Not Happy With Google's Library Digitization Program followed in May. Publishers' Group Asks Google To Halt Scanning For 6 Months from June covers more pressure. Eventually, we got to a lawsuit in September (Google's Library Scanning Project Heads to Court) and a further one last month (Association of American Publishers Sues Google over Library Digitization Plan).
What's lost in all these objections is that Google Library is NOT reprinting books online. Back to that confusing MediaPost article:
Cynics speculate all books will be made available via search. The company has not said how it will address copyright laws.
--and--
So, dear readers, how do you feel about this? As a writer and a consumer, I am torn. When I've got my writing hat on, I'd say this is wrong. There must be protection in regard to copyrighted materials.
Google has said how it will protect copyright laws, that being that it will not and does not reprint books that are in copyright without explicit publisher permission via the Google Print publisher's program.
Google Library simply makes the content of a book searchable. You can go to the Google Print site, maybe find a matching book scanned through Google Library, but you won't see anything from that book unless the book publisher has given explicit permission for this. The only exception to this is if the book is out of copyright.
My recent Indexing Versus Caching & How Google Print Doesn't Reprint post explains this in more depth. Google Library is the scanning process for SOME of the content in Google Print, but that scanning is NOT the same as printing material. Google Library is effectively making a card catalog of books.
Gary hates me using the card catalog analogy as too archaic, but too bad -- I think that still resonates with many people. Card catalog, "online public access catalog," whatever you want to call them -- it's whatever you use to find a book in a library.
Now think about the last time you went into a library and sat down at a search terminal to find a book. When you got a match, did you then click and read the book on the computer screen? No, in all likelihood you did not. Instead, you were given the location of the book in the stacks, and you walked over to pull it off the shelf.
Google Library is helping Google create that type of searchable index of books, that feeds into Google Print -- but Google Print does not let you then pull the book off the virtual shelf and read it online unless a publisher has explicitly given permission.
Whether the scanning itself to build a search index is still a copyright infringement remains to be seen. If so, my Why Don't Book Publishers Object To Web Indexing?, Forget Google Print Copyright Infringement; Search Engines Already Infringe and Legal Experts Say Google Library Digitization Project Likely OK; Will It Revolve Around Snippets? posts explain why scanning of web pages has gone on for over a decade without legal repercussions, and how publisher groups involved in the Google Print lawsuit themselves sing a different tune when it comes to web indexing, though the principle at stake is the same.
Back to Google Print, the most recent news has been that it is making public domain works gained through Google Library available online via Google Print. Unlike what the MediaPost article suggests, however, these are not the only works you can get. As I've explained, works that are still in copyright works may also be read online, but this is with publisher permission.
And finally, back to that Google Print versus Google Library confusion. It is difficult for anyone to understand the differences between the publisher program, the library scanning program and what both allow and do. It would help if Google gave the library program -- which at the moment seems to be called the Google Print Library Project -- a better name.
For example, take the Why we believe in Google Print post over at Google from last month, where Google writes:
We've been asked recently why we're so determined to pursue Google Print, even though it has drawn industry opposition in the form of two lawsuits, the most recent coming today from several members of the American Association of Publishers
Google's not being sued over Google Print. It's being sued over Google Library. But the failure to distinguish the two things is making ALL of Google Print seem like it's under fire. Google Print has much content that publishers are voluntarily providing. It's the Google Library that's the problem right now for Google, so give that a name separate from Google Print and perhaps some of the confusion between the two will go away and benefit discussion about the real issues, rather than what often seem to be mistaken assumptions.
Want to know more? If you're a Search Engine Watch member, our Google: Print & Library section of Search Engine Watch has a rundown on many more past posts with history. Plus, you help support Search Engine Watch and the tired fingers of me, Gary and Chris here at the site.
Want to comment or discuss? Visit our Google Sued Over Google Print Library Scanning in the Search Engine Watch Forums or create a new thread over there.
Posted by Danny Sullivan at 11:31 PM | Permalink
The Google Print story (specifically, the Google Print for Libraries aspect of it) continues to make headlines. No, not another lawsuit but this time a letter from the National Consumer League (NCL) calling for congressional hearings on the matter.
Highlights from the announcement: In a letter to the chairmen of the House and Senate Judiciary subcommittees overseeing intellectual property issues, the nation's oldest consumer advocacy group raised concerns about a forthcoming ambitious effort to catalogue the entire collections of four major American libraries. The letter, signed by National Consumers League President Linda Golodner, acknowledges the tremendous potential value in Google Inc.'s bold vision for the new initiative, in which the complete collection of works at the university libraries of Stanford, Michigan, and Harvard, and of the New York Public Library, would be scanned and made available electronically to the public. The Washington-based advocacy group warned, however, that the project, which will resume scanning on November 1, 2005 poses dramatic threats to the principle of copyrights; fairness to authors; and cultural selectivity, exclusion, and censorship...We do not doubt Google's good intentions," wrote Golodner. "But any database which represents itself as being a 'full' or 'complete' record of American culture as reflected in the collections of four major research libraries must, in fact, be complete.
The full text of the letters from the NCL to: Honorable Lamar S. Smith, Chairman, Subcommittee on Courts, the Internet, and Intellectual Property (PDF) and Senator Orrin Hatch, Chairman, Subcommittee on Intellectual Property (PDF) are also available. Smart move for Google to have just opened a lobbying office in DC. (-:
Posted by Gary Price at 2:37 PM | Permalink
I've written before that legal concerns about book indexing and Google Print may have repercussions for web indexing. Kevin Werback and David Winer look at this again, afresh. A look at this, plus the crucial difference between indexing (making something searchable) and caching (reprinting content). Google's library scanning program makes things searchable in Google Print but reprinted.
Breaking Apart at the Seams from Kevin stresses as I've done that indexing the words on a web page isn't that much different than indexing the words on a printed page. He wonders if a lawsuit preventing book indexing might a type of unraveling of sharing content online in general.
A turning point for the web? from Dave goes much longer to counter the notion that an opt-out approach is acceptable. Unfortunately, he's just not getting some of the points of what's involved correct. Specifically:
If you publish a site, Google reads the whole site into its cache and then lets you find things in it. Generally people who publish sites know this, and want Google to do this.
Google's index and its cache are two different things, and it's critical -- absolutely critical -- they not be confused like this.
When any search engine visits a web page, it effectively makes a copy of that page which is stored in the index. But the index literally breaks apart the page. It stores where words were located, were they in bold, what other words were they near, were the words in a hyperlink and so on.
Nothing in the index is anything you as a human being could read. I've described the index in searching classes to being like a "big book of the web." But it's not, really. It's more like a giant spreadsheet, where all the words of a page are in one row of the spreadsheet, each word to a different column, then the next page in the row below that, and so on. It's not something a human being would read.
Aside from the index, Google, Yahoo, MSN and Ask Jeeves also make "cached" copies of pages available. You can see a copy of the exact page the search engine spidered. These cached pages are kept separate from the index. They are useful for when a page is down or for a copyright holder wants to see if someone has stolen and cloaked their content to feed to a spider. But the legality of showing such cached pages is also in question. No one today has challenged them in court. The reason seems to be that Google, which mainstreamed cached copies, lets site owners opt out of caching if they want.
All major search engines also let you opt out of being in their indexes, as well -- a completely different thing -- and another reason why the index shouldn't be confused with the cache. To take Google as an example, you can:
The ability to opt-out of the index is another reason why we really haven't had a major search engine sued over web search indexing. In addition, site owners as Dave notes generally want to be indexed, so they can get traffic. In fact, the reason so many are upset over the current indexing update at Google is that they feel changes are causing them to lose traffic. But whether it is LEGAL to do this type of indexing (as opposed to caching) still really hasn't been tested.
So indexing and caching are NOT the same. Back to Dave's piece. He writes:
Google clearly does not have the right to make a copy of the book and republish it without the permission of or compensation to the copyright owner. The publishers appear to be on the right side of this one, and while I'm not a lawyer, I can't imagine that they won't prevail in court.
I'm not a lawyer either, but I can completely imagine that Google might win. Maybe not, but it's hardly far-fetched or doubtful, and even some lawyers feel they may win.
Here's the thing. Google is NOT, repeat NOT, republishing copies of books that it scans out of libraries. This is a fundamental mistake that many people seem to be making.
Google is scanning books into an index, just as it spiders web pages and adds them to its index. It is making the books searchable by doing this, but that process does not republish the books in a way you can read.
Think about it in web search terms. You can find a matching book, but there's NO hyperlink to click on that will take you to an online version of the book itself. There's just a snippet -- maybe -- of the text surrounding the words matching what you looked for.
Want the actual book? Google Print won't give it to you. Instead, you have to go someplace and buy it or find it in a library. Google Print merely tells you the book may be what you're looking for.
The only exception to this is if a publisher OPTS-IN. Not opt-out. If a publisher chooses, then -- and only then for books that are in copyright -- will Google display some of the actual book. The exact amount is left up to the publisher.
So, I've covered that indexing means making a book (or web page) searchable while caching means making a page (or a book) viewable online, without having to go to the source material (the book or the page). Let's recap then how both systems work:
Search Type Indexing Caching Snippets/ Descriptions Web Opt-Out Opt-Out Opt-Out Books Opt-Out Opt-In Opt-OutAs you can see, book search is actually more opt-in than web search is. Books themselves aren't cached or shown. But they are made searchable without permission.
That systems has worked on the web, because of the aforementioned feeling that site owners want traffic. As for book publishers, Why Don't Book Publishers Object To Web Indexing? from me earlier covers how many seem not to mind getting traffic through an opt-out system on the web, as well.
It remains to a court to decide whether it should be workable when it comes to book indexing. If not, then absolutely, you might see search engines ponder if web indexing itself -- which really hasn't been legally tested -- is something they'll need to require an opt-in for. And if that's the case, web indexing will get pretty bad, since many publishers will simply fail to make the opt-in effort.
What's that third column, the snippets/description one? That's the place where I think book publishers might prevail, and certainly a change that Google should consider. Legal Experts Say Google Library Digitization Project Likely OK; Will It Revolve Around Snippets? covers how it's possible that in some cases, even the limited description that Google puts on pages might give away some of the value of a book and thus real harm might be proven to a publisher. Solution? Make showing descriptions an opt-IN thing.
Lastly, Dave makes a couple of other comments:
It's time to realize that Google is no longer the little company we used to love. They're now a huge company that pushes individuals around like a lot of other huge companies. They need some balance to their power. And it's ridiculous to blindly take their side on every issue. Sometimes they're wrong, and I believe this is one of those times. It's certainly worth considering the possibility that they're wrong.
Absolutely, Google is a big giant company, not some tiny lovable start-up. If anyone still has that idea, definitely get it out of your mind now. But whether you think they push others around or not may depend on what area we're talking about. And whether a company of any type should be hated because they're big is another issue, as well. Nor should it be assumed that Google is always right. The most definitely are not.
As for this:
This situation is much like the disagreement we had with Google a few months back, when they wanted to put ads on our sites without permission and without paying....and right now they're putting ads on your content without your permission, without compensating you. Now how do you feel about that?
Dave is talking about Google's AutoLink. I'd disagree that the links Google may insert if someone clicks on the right button in the Google Toolbar are ads, so don't freak out if you aren't familiar with AutoLink and are suddenly scanning your pages to find how Google got real AdSense ads on it. They didn't.
I would agree that Google should to the opt-out route with AutoLink, as I wrote before. But it's also a harder argument to have, when there's been the incredible popularity of GreaseMonkey for Firefox, which can insert links into pages. Plenty of people use CustomizeGoogle, which inserts links into Google's own pages. Fair turnabout, some who hate AutoLink would say. Yes, it is -- but then it also weakens the argument that Google itself can't let people put links into pages with its own tools.
Postscript: Ray Gordon writes to say he has filed a complaint arguing that web search on an opt-out basis is in violation of copyright. You can read the filings here. I've skimmed them, and he seems more concerned about usenet material (rather than web material) that can't be removed, apparently because others may have reprinted his own posts.
Postscript 2: Dan Thies writes that an search index is even less readable than a spreadsheet, and he's correct. I was trying to keep things simple yet familiar to illustrate the difference between words arranged on a page for reading and words indexed to make a search engine. As Dan says, he understands I was keeping things simple -- but he also takes you deeper into how inaccessible to a "reader" a real index actually is.
Posted by Danny Sullivan at 11:33 AM | Permalink
It's still very early in the game when it comes to search engines and legal issues. Although a number of lawsuits have helped clarify things like appropriate content in meta tags and whether using trademarks is fair game, lots of other issues are still unclear and up in the air.
It's important to understand these issues if you're a search marketer, both to stay out of trouble and to know what recourse you have if someone poaches your intellectual content. A panel of legal experts discussed these issues on a recent Search Engine Strategies panel, and guest writer Grant Crowell caught the session, reporting on it in today's SearchDay article, Copyrights, Trademarks and Search Engines.
A longer version of this story for Search Engine Watch members goes into more detail about various methods to protect your intellectual content, including how to safeguard images from being copied, and a checklist for taking action if you've found that your content has been stolen and illicitly used elsewhere on the web.
Posted by Chris Sherman at 6:07 AM | Permalink
Well, it appears that Google will be heading to court, again, over their Google Library book digitization program.
According to this news release, one of the largest trade groups of publishers in the U.S., the The Association of American Publishers (AAP) that represents more than 300 members (here's a list) including many commercial book publishers, is suing Google over the program.
Wow, that Google department is one busy organization.
From the News Release The Association of American Publishers (AAP) today announced the filing of a lawsuit against Google over its plans to digitally copy and distribute copyrighted works without permission of the copyright owners. The lawsuit was filed only after lengthy discussions broke down between AAP and Google's top management regarding the copyright infringement implications of the Google Print Library Project.
The suit, which seeks a declaration by the court that Google commits infringement when it scans entire books covered by copyright and a court order preventing it from doing so without permission of the copyright owner, was filed on behalf of five major publisher members of AAP: The McGraw-Hill Companies, Pearson Education, Penguin Group (USA), Simon & Schuster and John Wiley & Sons.
The release also mentions that the AAP proposed a method using ISBN (International Standard Book Numbers) to help identify works under copyright (at least since 1967) and then get permission from publishers and authors to scan these works. According to the statement, "Google flatly rejected this reasonable proposal."
As I tried to make clear yesterday, their is a clear difference between the Google Print program (where material comes directly from publishers) and the Google Library scanning program where Google plans to retrospectively scan every book in several major libraries BUT only shows snippets of content from a given book on a results page. My post from yesterday has several related links that might be of interest.
"The publishing industry is united behind this lawsuit against Google and united in the fight to defend their rights," said AAP President and former Colorado Congresswoman Patricia Schroeder. "While authors and publishers know how useful Google's search engine can be and think the Print Library could be an excellent resource, the bottom line is that under its current plan Google is seeking to make millions of dollars by freeloading on the talent and property of authors and publishers."This is the second time (that we know of) that Google has been taken to court over the library digitization program. A month ago we blogged that The Authors Guild had also filed suit against Google over the program.
Finally, it's interesting to note that the AAP statement makes a mention of the new Yahoo/Internet Archive Open Content Alliance that was announced a few weeks ago.
Noting the existence of new online search initiatives that respect the rights of creators, such as the ?Open Content Alliance? involving Yahoo, Hewlett-Packard, Adobe and the Internet Archive, Mrs. Schroeder said: ?If Google can scan every book in the English language, surely they can utilize ISBNs. By rejecting the reasonable ISBN solution, Google left our members no choice but to file this suit.?David Drummond, Google's Vice President, Corporate Development and General Counsel, has just released this statement about today's AAP announcemement: "Google Print is an historic effort to make millions of books easier for people to find and buy. Creating an easy to use index of books is fair use under copyright law and supports the purpose of copyright: to increase the awareness and sales of books directly benefiting copyright holders. This short-sighted attempt to block Google Print works counter to the interests of not just the world's readers, but also the world's authors and publishers."
Postscript from Danny: It's worth noting that the publishers aren't upset wth the Google Print search service but instead what they call the "Google Print Library Project," the program where Google is scanning books in cooperation with libraries. What's the different, and why is the library program itself the issue? Google Print shows content from books that many publishers themselves give Google permission to show. Publishers can choose exactly how much of a book they want to show, from only a few snippets of text, to a few pages to even the entire book, if they explicitly tell Google they want that to happen. The library project is meant to help feed additional content into Google Print without publishers explicitly cooperating. Google has said only tiny snippets of material from any of this content would be shown. Yes, the full text it scanned and made searchable, but no pages from the books are shown. Still, the act of scanning the text in the first place is alarming enough to some publishers, who consider that a copyright violation. Whether it is is now something that the courts will decide.
Postscript from Gary: Interesting article about the lawsuit in The Book Standard that includes this snippet. She [AAP President Pat Schroder] added that Google had indicated its willingness to delay the project for a year, an offer chief negotiators John Sargent, CEO of Holtzbrink, and Richard Sarnoff, president of New Media and Corporate Development at Random House, rejected. ?The terms that everybody worked so hard on were dismissed because Google could not swallow the basic issue about permissions,? she said.
Postscript 2: David Drummond, Vice President, Corporate Development has posted more thoughts about the lawsuit and Google Print on the Google Blog.
Posted by Gary Price at 11:54 AM | Permalink
Looking for video content? I mean, not just looking but also wanting to download it? BitTorrent is a popular way for many seeking to get the latest television program or film. BitTorrent's Grab at Respectability from BusinessWeek looks at how the service wants to move on by raising capital and turning into a distribution network for publishers, rather than for those sharing published works.
Posted by Danny Sullivan at 9:04 AM | Permalink
The University of Michigan Library is one of the five libraries Google plans to digitize materials from. The Michigan Daily article: U backs Google in lawsuit, offers comments from a U of M official who says that the unversity is enthusiastic about the program. On Tuesday, The Authors Guild (and three authors) filed a class action lawsuit. The Authors Guild says that the Google Library scanning program is a, "brazen violation of copyright law."
From the Michigan Daily article: Defending the legality of Google's actions, the University said it continued to be enthusiastic about the project. "We are confident that this project complies with copyright law," James Hilton, an associate provost and the University's interim librarian, said in a written statement. "This project represents an enormous leap forward in the public's ability to search and find knowledge," he said.
The article also includes comments Stanford Law School Prof. Lawrence Lessig: "Technically, copyright law states that if you make of copy of a work, that you need to obtain permission from the author," Lessig said. However, he said it is important to recognize what Google is attempting to accomplish by digitizing works and enhancing public access.
Postscript: The full text of the University of Michigan statement about Google Library is available here.
Posted by Gary Price at 11:17 AM | Permalink
Many predicted that the copyright issues that surround Google's library book scanning project would end up in court. Today those predictions came true.
This afternoon, the 8000 member Authors Guild and three individual authors, including a former Poet Laureate of the United States, filed a class action lawsuit in federal court against Google over the Google Library book scanning project which is part of Google Print.
From the news release:
The suit alleges that the $90 billion search engine and advertising juggernaut is engaging in massive copyright infringement at the expense of the rights of individual writers....This is a plain and brazen violation of copyright law, said Authors Guild president Nick Taylor. It's not up to Google or anyone other than the authors, the rightful owners of these copyrights, to decide whether and how their works will be copied.
More in the news release and this News.com story by Elinor Mills.
What does Google have to say about the lawsuit? Google's Nate Tyler told SEW:
The Google Print program respects copyright. We regret that this group has chosen litigation to try to stop a program that will make books and the information within them more discoverable to the world. Google Print directly benefits authors and publishers by increasing awareness of and sales of the books in the program. And, if they choose, authors and publishers can exclude books from the program if they don't want their material included. Copyrighted books are indexed to create an electronic card catalog and only snippets of the books are shown unless the content owner gives permission to show more.
Over the past few months we've blogged about major concerns over the Google Library scanning project from several publishing trade groups. However, as far as I know, no legal action has been filed by any of these groups.
For more on Google Library see:
Postscript From Danny: Google Print and the Authors Guild on the official Google blog has a reaction to the suit, illustrating that only tiny, tiny portions of copyrighted works would be shown and defending the project as being consistent with copyright law.
Via BoingBoing, a copy of the legal complaint is here (PDF). BoingBoing also points to this EFF article saying (as some of the articles above have already covered) that Google may have a good fair use argument. The EFF article points to this analysis (PDF) by Jonathan Band looking at how the complaints compare when measured up against the limited case law we have about search engine indexing.
I agree with the paper, especially the point as I've written before that search engines already supposedly infringement copyright in the way publishers describe for web indexing, and have done so for years and to some of these publishers' own web sites, without them complaining. More on that from me in these articles:
However, on the fair use front, I recently discussed on in some circumstances, even the tiny snippets shown could potentially be found to do real harm to an author. It would be rare, but possible, and might be a reason for Google to adopt a policy of not showing anything content at all beyond book title and maybe table of contents, for books which is doesn't have explicit copyright permission. The article below has more on that:
Want to discuss or comment? Visit our forum thread, Google Sued Over Google Print Library Scanning.
Posted by Gary Price at 7:23 PM | Permalink
Baidu Ordered To Stop Music DownloadsBaidu ordered to halt music downloading service from Reuters covers Chinese search engine Baidu being ordered by a Chinese court to stop providing music downloads. Baidu plans to appeal, saying it doesn't provide downloading services but rather search services. Music search is a chief driver of Baidu's popularity, as I've written recently. If it's not even allowed to offer music searching, that's likely to put a major crimp in its growth. Five Music Companies Sue Baidu covers other companies that have suits filed against Baidu.
Posted by Danny Sullivan at 7:34 AM | Permalink
The Friday Project's Google Debate program has begun, which will have various solicited parties offering up opinions on the Google Print program's library scanning project. Worth checking out, if you're interested in exploring the legalities and opinions in the process. I'll likely be contributing an opinion in the near future.
Posted by Danny Sullivan at 10:04 AM | Permalink
Baidu sued over music downloads from the Hong, from the Hong Kong Standard, reports that several large music companies are suing Chinese search engine Baidu for allegedly making hundreds of songs easily accessible via their MP3 search tool. The companies filing the lawsuit are Universal, EMI, Warner, Sony BMG and their local subsidiaries, Cinepoly, Go East and Gold Label.
What has drawn the industry's ire is the ease with which Internet users can use Baidu's search engine to locate copies of music stored on the Web, even to the point of organizing songs into Top 10 lists by category. When a user clicks on a particular song, the engine provides a direct link to the URL where the file is stored.Since the search process is automatic, Baidu argues that it is simply providing the basic service offered by all search engines, and is not itself involved in any copyright infringement. In addition, it promises to remove the link if a company can prove it owns the right to a song. "This practice is consistent with legal requirements of PRC law," Baidu said last night.
The industry, however, argues that a Chinese court, in an earlier case, ruled MP3 searches were illegal.
Prior to Baidu's IPO, some speculated that copyright issues might be a concern for the company. We also blogged a report about Baidu removing links to thousands of pirated files from their database.
Posted by Gary Price at 10:36 AM | Permalink
Courts Unlikely To Stop Google Book Copying from InternetWeek has legal experts saying that copyright law over indexing books appears to be on Google's side. Of less concern is whether Google actually gets permission to do copying. More weight is applied to the economic impact on the copyright holder and the amount of material used in proportion to the whole. But don't they use all the book? Yes, they scan all the book but they show little without explicit permission (for more, see our past Another Google Book Scanning Debate & Another Publisher Group Objects post). The InternetWeek article is a great look at some of the issues involved. My favorite part:
"If copyright law worked the way Google would like to see it working, then everyone in the world would be able to use the material unless the copyright holder explicitly told them not to, and even then it would be OK," says Allan Adler, the vice president for legal and government affairs for the Association of American Publishers. "That would be a very strange copyright system."
As I've said before, that's exactly how things currently work with web indexing. The Association Of American Publishers doesn't appear to have minded Google indexing nearly 800 pages from the site over the years without permission, all of which have copyright protection. But books apparently are different.
To be fair, books are different in the sense that most web sites don't earn money by selling their content. They typically earn by carrying ads. Book sellers do have legitimate fears that online book searching might lead to less sales -- and that appears a factor that will be key in any dispute. It would have to be proven.
But say I'm looking for a particular fact. I search for a book using Google Print. I find that there's a book that appears to match, but since the publisher hasn't given Google what I'd call "display" permission as opposed to "indexing" permission, I can't see the answer. Harm? Hard to show. Benefit? Easier to show. I didn't know this book might have an answer I needed. Now I do, and I might go get it.
One lawyers in the article makes exactly this argument in the latter part of the story, dealing with past case law that will likely be applied.
Here's the exception. What if I can see the answer? Look here. That illustrates how without explicit display permission, Google will show only a few lines or "snippets" of information. But if the answer I want is in the snippet, then it is easier to show harm. I no longer may need to buy the book. Imagine a book about computer game tricks and tips. If I can see the tip in the snippet, I may solve my problem and save my money.
One solution might be to completely eliminate snippet display for books without copyrighted permission. Some web sites can argue the same, that having snippets might mean people don't come to them, of course. But Google already provides a way for site owners to turn off snippets. That's an opt-out thing. Perhaps with Google Print, showing even snippets will need to be an opt-in situation.
Posted by Danny Sullivan at 9:26 AM | Permalink
Gary posted earlier about the latest publisher's group to object to Google's digital library program. As I've posted earlier, I've found some of the arguments odd given that search engines have long indexed copyrighted material from across the web without permission and without complaint by publishing groups. This time, I followed up with Sally Morris, chief executive of the Association Of Learned And Professional Society Publishers, about why her group seems to me to view print copyright as something deserving greater indexing protection that web copyright.
The ALPSP put out a statement (PDF format) last week with this key highlight that caught my eye:
Google Print for Libraries is a very different matter. We firmly believe that, in cases where the works digitised are still in copyright, the law does not permit making a complete digital copy for such purposes.
I asked Morris:
Is the view of the ALPSP that this applies only to books or any copyrighted work? I ask because all works, to my knowledge, enjoy automatic copyright protection in many counties. That's certainly the case in the US. This includes the billions of pages that Google and other search engines index on the web.
Google, for example, has indexed nearly 1,000 pages from the ALPSP web site. My assumption is that the ALPSP never overtly asked for these pages, all of which are copyrighted, to be digitized and included in Google. Despite this, I've never heard your organization complain about such indexing.
In fact, when I look here, you seem not only happy to have Google index your pages but also happy to have other people search the entire web's copyrighted works via Google.
In short, why is opt-out OK when it comes to web content but not OK when it comes to published works?
Morris replied:
In my view the laws of copyright are not different for books and for other copyright works. So you're right, in principle Google should seek opt-in permission before indexing freely available web pages, too (as, indeed, the British Library's web archiving project has very properly done - and very hard work it is too). However, I think the issue is much more acute where the content is not made freely available by its copyright owner - which is, of course, the case for all the in-copyright content Google are planning to digitise from libraries
I wasn't convinced on the "freely available" front and sent this follow-up:
Why is publishing a book not making content freely available?
If I go into a library, I've got plenty of content for free. That's exactly why Google has gone into the libraries. The information is made accessible to any patron.
I don't know of any library being sued for allowing people to borrow books, which arguably goes directly to the potential earnings a publisher could make. You'd know far better than I if this has actually happened, of course. But the books are there, and they are free to anyone able to gain a library card or lending rights. In contrast, Google is not making the full text of books available as a library does. If anything, libraries are far greater infringers than Google and have been so longer. Why aren't libraries being targeted?
Now if you mean freely available in terms of easy of copying -- IE, web pages published to the web are easier for anyone to access -- I can understand your point of view more. I'd still disagree with it, however. Just because I distribute on the web doesn't mean I consider my copyright to be any less important than if I publish in print.
As for the British Library project, my understanding was that they wanted to have the law changed to essentially give them opt-out > abilities:
One of the problems faced by the consortium is that, due to UK copyright law, permission is needed before a site can be archived. The British Library is working with the government to extend the law to allow them blanket access to all Web sites because "there are 4 million sites that we would like to capture -- we cannot ask everyone for permission," said Boulderstone.
They're also not quite the same as with search engines. While search engines generally do offer cached copies of pages, archiving is more substantial, making actual lasting full-text copies of pages.
Morris replied:
A published book is sold - to the individual or to the library. Lending it out does not contravene copyright. To my mind, making a digital copy of the whole thing does.
We are not saying that increasing visibility via Google Print is a bad thing - I think those of our members who participate in the Google Print for Publishers program (or who otherwise allow Google to index their closed content) are generally pleased with the increased hits, though I'm less clear whether they are in fact seeing increased sales. All we're saying is that the method of achieving it seems to us clearly to break copyright laws - and we'd like to work with Google to find an acceptable way of getting publishers' opt-in.
And I guess all I'm saying is that those publishers, if they try to push this angle with Google via a lawsuit, had better be prepared for explaining why they've never complained about having their web sites indexed by Google for years without permission.
Moreover, woe to the publisher or member of a publishing group that is ever found during legal disclosure to have complained about not being indexed better on Google. You can't enjoy years of free traffic from a source, then suddenly decide that copyright law is now different just because the words appear in print, rather than on the web.
One interesting solution will be to see if Google simply goes out and buys a copy of every book it wants to offer in its virtual library. If libraries are OK lending books, Google might argue that it's creating a card catalog of books in its collection. Heck, you could even make it so that only one person at a time could "check out" viewing some of the pages that Google Print offers for reading online.
I do have sympathy for publisher rights. I publish material myself. I just find it bizarre to see the print industry suddenly acting like it can ignore 10 years of web indexing. For more on me on this topic, see:
See also our forum thread, SEW should support the AAUP's position on Google.
Postscript from Gary: I wanted to add two points to Danny's post. First, Danny writes, "Heck, you could even make it so that only one person at a time could "check out" viewing some of the pages that Google Print offers for reading online." Actually, this concept is already being used by many libraries. Libraries purchase access to digital copies of both new and old full text books via services like ebrary and NetLibrary. Patrons can then "virtually" check-out these books for a certain period of time. Services like Books24x7 and Safari Tech Books also provide searchable full text books online and unlike Google Print/Library, there is NO limit on how much you read online. As I've pointed out before, many of you have free access (from home) to one or more of these services via a local, corporate, or university library. More about that here. Btw, many libraries have started to allow card holders the ability to virtually check-out and download audio books for free. (-: Second, from a searcher's perspective. Material that Google scans from a library that's still in copyright will be full text searchable but not full text viewable online. Google puts it this way, "you will only be able to view the bibliographic information and a few short sentences of text around your search term." You will also be unable to print this material (yes, you could do screen caps). Here's a screen cap of what will be visible.Posted by Danny Sullivan at 3:51 PM | Permalink
Shortly after Google announced some changes to their library scanning project, the Association of Amercican Publishers said they weren't pleased (see: Google Gives Publishers Opt-Out From Library Scanning Project; One Group Still Not Happy).
Now, another publishing industry trade group, The Association of Learned and Professional Society Publishers (ALPSP), has shared its views. You can read them in this new position paper (PDF).
Key Passages from the Paper: A number of our member publishers also participate in the Google Print for Publishers program ? which allows them to opt-in, and to specify what content may or may not be freely displayed and what links should be supplied to enable users to purchase the publication. These publishers have been pleased with the increased hits although, as far as we are aware, actual sales have not increased dramatically.
Google Print for Libraries is a very different matter. We firmly believe that, in cases where the works digitised are still in copyright, the law does not permit making a complete digital copy for such purposes. We are willing to work with Google to find a mutually acceptable way forward; however, we do not consider Google?s proposal to stop the digitisation program until 1 November, up to which date publishers may exclude their works by supplying full bibliographic details including ISBN/ISSN (a major undertaking), to offer an acceptable solution.
We call on Google to hold an urgent meeting with representatives of all major publishing organisations, in order to work out an acceptable pragmatic way forward and to avoid legal action.
This is not the first time we've heard from the ALPSP. This blog post from early July (before the changes) has details.
Another Group Comments On August 19th, the Association of American University Presses (AAUP) issued a statement.
From the statement: By temporarily suspending the digitization of copyrighted work Google?s revised policy makes an important concession to the rights of copyright holders. In its essentials, though, the revised policy is virtually the same as the previous one. Google still asserts that it may make digital copies of all books in copyright, and that they will respect the copyrights only of those who supply Google with a list of books for which rights must be recognized. In other words, Google, an enormously successful company, claims a sweeping right to appropriate the property of others for its own commercial use unless it is told, case by case and instance by instance, not to. In our view this contradicts both law and common sense.
Posted by Gary Price at 1:38 PM | Permalink
When Google Video first launched viewable content in June, we posted (as did many others) about viewable copyrighted material that was quickly found in Google Video. The other day I went to look to see if this type of content was still as easy to find and if new copyrighted material was being uploaded and made viewable. The answer is yes. In a matter of minutes I was able to find lots of material that still is most likely copyrighted.
Here are a few examples:
+ Saturday Night Live "Cow Bell" Sketch with Will Farrell and Christopher Walken Uploaded July 24, 2005 Note: Very Funny Stuff! + A Mad TV Skit Uploaded June 17, 2005
+ Triumph the Dog Interviews Bon Jovi (Conan O'Brien) Uploaded July 1, 2005
+ Beatles Video Uploaded August 3, 2005
+ A Scene from The Muppets Show Uploaded June 30, 2005
+ Hogan Vs Undertaker (WWE) Uploaded April 30
+ NCAA Basketball Uploaded April 30
+ Napoleon Dynamite (Jon Heder) on David Letterman (Top 10 List) Uploaded June 30
I'm surprised that certain words in the titles of these files and other metadata supplied by those uploading the files wouldn't trigger someone at Google to take a closer look before the material was made available.
Posted by Gary Price at 2:38 PM | Permalink
Details on the Google Blog. The biggest change is that Google will allow publishers who are members of the Google Print program to explicity tell Google what library books they want digitized.
If you?re in the Publisher Program (or you decide to join it), you can now give us a list of the books that, if we scan them at a library, you?d like to have added immediately to your account...So now, any and all copyright holders ? both Google Print partners and non-partners ? can tell us which books they?d prefer that we not scan if we find them in a library. To allow plenty of time to review these new options, we won?t scan any in-copyright books from now until this November.To be clear, Google will continue scanning public domain materials.
The Google Blog post also mentions that these changes came about after talks with "variety of constituencies" including various publishing groups (US and international) that have expressed concerns over the project. However, it looks like today's announcement has done little to stop these concerns from at least one group of publishers.
From the Association of Amercican Publishers:
Google's announcement does nothing to relieve the publishing industry's concerns," said Patricia Schroeder, AAP's President and CEO. Google's plan calls for digitally copying every work in the collections of three major libraries unless specifically denied permission for a particular work by the copyright owner. "Google's procedure shifts the responsibility for preventing infringement to the copyright owner rather than the user, turning every principle of copyright law on its ear," said Mrs. Schroeder...We were confident that by working together, Google and publishers could have produced a system that would work for everyone, and regret that Google has decided not to work with us on our alternative proposal," Mrs. Schroeder said.Publishers and individual authors can learn more here.
Remember, Google Library scans the entire book but only show a few snippets around each search term if the book is still in copyright. Screenshots here illustrate what you'll see from a Google Library result vs a Google Print result. More about the differences between Google Print for Publishers and Google Library, here.
Next week, I'm hoping to do an email Q&A with a Google Print official, more later.
Finally, don't forget many other companies and projects are already providing unlimited full text access (not just snippets) to thousands of books. I name just a few of them here and here. This post looks at a few of my favorite specialty databases that offer access to books (thousands available).
Posted by Gary Price at 9:27 AM | Permalink
Last week we blogged about Baidu's IPO announcement and included a link to Matt Marshall's post that included a discussion of possible copyright issues that Baidu might face as a company traded in the US. Apparently copyright concerns were also a major issue for Baidu execs especially after hearing from a music licensing organization. Today, word from Shanghai that Baidu is removing links to as many as 50,000 pirated music files.
R2G Chief Operating Officer Scarlett Li told the newspaper that Baidu had taken out Web links to more than 3,000 music files of a single popular Chinese song alone. She added that the search engine was also looking into links to more than 50,000 files.Posted by Gary Price at 2:01 PM | Permalink
Spotted via Threadwatch, Keeper of Expired Web Pages Is Sued Because Archive Was Used in Another Suit from the New York Times discusses how the Internet Archive is being sued for crawling the web and making copies of web pages. A copyright infringement case against a search engine, then? Not exactly, as we'll see.
At issue, a court case on trademarks were evidence of past usage was found through the Internet Archive. Healthcare Advocates said copies of its pages were made without permission. In particular, Healthcare Advocates says despite making use of a robots.txt file, there were 92 occasions when its pages still managed to be accessed.
In a further twist, the company claims the law firm getting those pages violated the Digital Millennium Copyright Act provisions of "circumventing" the robots.txt file exclusion.
Time for a good laugh at that, honestly. As the article explains, robots.txt is a voluntary opt-out measure designed for crawlers. It has no legal bearing. In addition, nothing in a browser prevents someone from viewing pages that have been blocked by robots.txt. In short, no one has to circumvent robots.txt to view a page. It doesn't try to block that at all.
As for the copyright infringement, from what I can see, the Internet Archive itself is not being sued for copyright infringement. Instead, it's being sued for allowing those copies to be seen despite a robots.txt block. The article says this failure has the Internet Archive under fire for "breach of contract and fiduciary duty, negligence and other charges."
Interesting. I'd say absurd, but you never know, maybe the case will convince a court that a search engine has some type of binding contract with company that runs a web site solely on the basis of crawling it. As said, robots.txt is a voluntary mechanism to keep pages out of a crawler. It's not a legal requirement.
Moreover, while I haven't seen the case yet (Gary will probably dig it up and post here, if so), red flags already go up about the robots.txt file preventing "public viewing" of the pages.
Robots.txt traditionally removes pages entirely from an index. They don't hang around. That's certainly what the Internet Archive says. If robots.txt was up, then at some point, the pages should have been entirely removed from the Internet Archive period.
For some further reading, my Google & Other Search Engines: The WMDs Of Copyright Infringement and Forget Google Print Copyright Infringement; Search Engines Already Infringe articles cover how search engines make copies of billions of documents each month without permission, relying on the opt-out non-legal provisions of robots.txt to hopefully keep them safe.
Postscript (from Gary): If you would like to read the actual complaint filed in the lawsuit, I've posted a copy (48 pages; PDF) here.Postscript 2: Internet Archive DMCA Circumvention Lawsuit from Seth Finkelstein looks at how the robots.txt file with Internet Archive doesn't actually remove content but rather simple suppresses display. And our forum thread, Implications of the Internet Archive lawsuit also looks at this and the important impact this can have if a domain name changes ownership. What you thought was removed might very well show up again.
Posted by Danny Sullivan at 8:52 AM | Permalink
An international group of publishers is requesting that Google "cease" scanning copyrighted materials for the Google Print for Libraries project until "appropriate licensing" can be worked out. The request was posted today on the The Association of Learned and Professional Society Publishers (ALPSP) site. The ALPSP is a trade association of not-for-profit publishers learned societies, university presses and others in more than 30 countries.
You can read the full text of the ALSP postion paper about Google Print for Libraries here (PDF).
Here's the final paragraph from the ALPSP paper: The Association of Learned and Professional Society Publishers calls on Google to cease unlicensed digitisation of copyright materials with immediate effect, and to enter into urgent discussions with representatives of the publishing industry in order to arrive at an appropriate licensing solution for ?Google Print for Libraries?. We cannot believe that a business which prides itself on its cooperation with publishers could seriously wish to build part of its business on a basis of copyright infringement.
Recently, I've blogged similar copyright concerns about Google Print for Libraries coming from the The Association of American University Presses (AAUP) and Association of American Publishers (AAP). The AAP has asked Google for a six month morotorium on scanning copyrighted library books.
In this blog post I do my best to differentiate Google Print for Libraries and Google Print for Publishers. For info about non-Google book digitization projects, this this post has info about two of many initiatives. Here, you'll find links to a few favorite online book database where most of the content is available full text and is free.
Posted by Gary Price at 3:53 PM | Permalink
A twofer here, the story of someone who found they got removed from Yahoo via the Digital Millennium Copyright Act but can't get a copy of the complaint, while how you might use the DMCA against those you feel deserve it.
Spotted via Threadwatch, DMCA: The New Blackhat for Yahoo! search from Brain Turner looks at how a DMCA complaint got him dumped out of Yahoo. The company confirmed he was removed but won't apparently provide any details of what was in the complaint. In contrast, Google makes all such complaints public.
That evil DMCA! Then again, it's pretty handy for those times someone has stolen your content and refuses to remove it. The DMCA won't get the content off the web, but Jenstar in our forums explains how it can very quickly at least starve that content of search oxygen. The What to do when someone steals your original content thread she started runs down the options in a comprehensive fashion.
Posted by Danny Sullivan at 4:17 PM | Permalink
Google has removed the copyrighted material (The Matrix Revolutions and some television programs) from their video database after it was discovered last week. However, it took me just a couple of minutes this afternoon to find a more examples of copyrighted content that's accessible and viewable via the Google Video database.
Here are a few examples of what I found:
+ The Beatles on Ed Sullivan Performing "She Loves You" Uploaded to Google Video on May 6, 2005. Screen cap
+ An Episode of South Park Uploaded to Google Video on April 17, 2005. Screen cap
+ USC Football Highlights Uploaded to Google Video on April 24, 2005 Screen cap This clip looks like it was recorded off of Fox Sports Net.
+ Two Skits from NBC's Saturday Night Live 1 ||| 2 Uploaded to Google Video on May 26, 2005 and June 8, 2005 Screen caps here and here.
Peter Chane, senior product manager for Google Video, told Danny last week that Google conducts a "very superficial review" of video that people upload to the service. He said they're primarily looking for porn and copyright violations and would remove violations they don't catch if reported.
Posted by Gary Price at 5:25 PM | Permalink
Google & Other Search Engines: The WMDs Of Copyright InfringementThe world seems to be waking up to the fact that search engines are potentially widespread copyright infringers, though it's Google, as usual, that takes the brunt of concerns. But for good reason, Google more than the other search engines is generating worry in new areas. A rundown on some good, recent articles on the subject.
For Soaring Google, Next Act Won't Be as Easy as the First from the Wall Street Journal (open access to everyone) is an excellent article that covers how the "opt out" approach to indexing that's been the norm in the search engine world is causing Google problems as it branches out into new areas.
Google Video's taping of television content without prior permission is said to have had executives at CBS and Warner Bros. extremely upset. "We're not just going to give this away for free," said a CBS exec, upset also not to have gained the "proper respect" as a potential partner. There are lots of other details on objections from others in the story and how Google went ahead even though it hadn't gained explicit permission that it was seeking.
The story also revisits what we've reported before, about some print publishers concerned over the Google digital library and Google Print programs. AFP concerns over Google News indexing is also raised.
Google's response to various concerns is that it is doing what fair use allows, that it allows publishers upset to opt-out even in some fair use cases and that as it expands, it will need to negotiate rights to certain types of content.
Boing Boing summarized a key part of Supreme Court's unsound decision at Salon that looks at how the Grokster case might impact Google. Of course, it's not just Google that would get impacted. It's any web search engine. The article highlights issues I've covered already, about how search engines are mass copyright infringers potentially, but that no one has really challenged them because web site owners seem to like the traffic they get.
The story missteps in suggesting that Google is a peer-to-peer copying tool. It is not. Rather than being like Grokster, which connected people but hosted nothing, Google and gang are much more like Napster -- which actually hosted material (see You Say Napster, I Say Grokster from Slate for more on the difference between the two). Napster, of course, lost its own lawsuit. Despite that, web search engines went on.
So taking a "sky is falling" line on Google in the wake of Grokster makes no sense. If Google and web search engines were going to take a fall, Napster would have been a key chop to fell them. Instead, forget Grokster and watch the most the traditional publishers -- print and video -- make against Google directly. That's going to be key, as I suspect will be what the search engines have already been allowed to do on an opt-out basis for about a decade now. More on that in my past post, Forget Google Print Copyright Infringement; Search Engines Already Infringe.
Finally, Click Here For Inducement Disclaimers from InternetNews.com looks at whether the mere act of running ads for program that might be used for copyright infringement might be considered inducement that lands Google in trouble.
Posted by Danny Sullivan at 3:53 PM | Permalink
Via our colleagues at Inside Google and Google Blogoscoped, we learn of several examples of copyright material apparently being distributed without permission via Google Video.
Peter Chane, senior product manager for Google Video, told Danny earlier this week that Google conducts a "very superficial review" of video that people upload to the service. He said they're primarily looking for porn and copyright violations and would remove violations they don't catch if reported.
Well, it seems like Google's review process has some kinks in it.
As Nathan documents, he was able to find and view for free, all 130 minutes of Matrix Revolutions, epsidoes of The Family Guy, and a clip from The Daily Show.
Meanwhile, Now playing on Google: 'Matrix,' 'Family Guy' from News.com today points out that some of this content has been in the Google database for several weeks, as submissions have been allowed before the new live video feature opened to the public.
For example, Matrix Revolutions has an upload date of June 9th.
Posted by Gary Price at 2:06 PM | Permalink
A New Page in Google's Books Fight from BusinessWeek, spotted via Search Engine Guide, covers the scanning halt request of Google's library digitization project we blogged earlier. However, it also touches on other letters of concern Google's apparently been sent, including from John Wiley & Sons and Random House. It also revisits the entire issue of whether its a copyright violation to do the digitization. "There's nothing that gives Google the right to make this copy" says Laura Gasaway, an intellectual property expert and law professor at the University of North Carolina, the story writes. Indeed, there's nothing that gives Google or other search engines the right to make copies of copyrighted works already on the web. Nevertheless, they have done so for the past 10 years -- and I'm convinced that's ultimately going to be a factor in all this. My earlier post Forget Google Print Copyright Infringement; Search Engines Already Infringe covers that in more depth.
Posted by Danny Sullivan at 8:38 AM | Permalink
Publishers' Group Asks Google to Stop Scanning Copyrighted Works for 6 Months in The Chronicle of Higher Education reports that the Association of American Publishers has sent a letter to Google requesting at least a six month moratorium on scanning copyrighted library materials for the Google Library project which is a part of Google Print.
Mr. [Allan] Adler [vice president for legal and governmental affairs at the AAP] said the letter was sent because members of the publishers' association feel they have not "gotten satisfactory answers to their questions about copyright infringement." Many publishers say that Google does not have the right even to scan a copyrighted book -- they argue that making a digital copy of a volume for any commercial purpose requires the permission of the copyright holder.According to the article, a letter was sent to Google's CEO, Eric Schmidt, on June 10th requesting a meeting with Google officials. Susan Wojcicki, director of product management for Google Print, told The Chronicle of Higher Education, that Google has not yet responded to the AAP's letter.
At this point, the June 10th letter from the APP to Google has not been publicly released. According to the article, the Association of American Publishers is giving Google time to respond before making it publicly available.
This is not the first letter sent by a publishing group to Google about the Google Library program.
This is not the first letter Google has received from a publishing industry trade group. Late last month, we blogged about another trade group, The Association of American University Presses, sending a letter to Google expressing their concerns about the Google Library program. Our post along with a link to the full text of that letter is here.
Posted by Gary Price at 3:52 PM | Permalink
Pursuing Copyright Infringers from Scottie Claiborne over at Search Engine Guide has a nice rundown on what to do if you find pages infringing your copyright in search results. Contacting the search engines along won't get the infringing material off the web, but it will at least starve it of its search engine traffic when listings are pulled. And did you get pulled? Via Search Engine Roundtable, this WebmasterWorld thread looks at what you might do to get reincluded, as least if pulled off of Google.
Posted by Danny Sullivan at 3:03 PM | Permalink
Gary passed along three new complaints at Chilling Effects asking Google to pull material because of defamation concerns. The first involved Relevance S.A., a Spanish company, asking for material to be removed that came up in a response to a search on relevance at Google Spain. The second is from a censored Germany company also seeking material to be removed. We can't tell if either was acted upon. A third UK request did have some material removed, but more is demanded. Latest takedown notices Google has passed to Chilling Effects can always be found on this page for non-US companies and here for US ones.
Posted by Danny Sullivan at 1:08 PM | Permalink
Forget Google Print Copyright Infringement; Search Engines Already InfringeGary blogged earlier about the Association of American University Presses having concerns that Google Print's digital library program may be equal to widescale copyright infringement. But that complaint, if ultimately upheld in a court case, would go far beyond print digitization. It might impact the fact that search engines already do widespread copying of content to provide the core search services we take for granted.
Let me zero in on a key part of the complaint:
Google's claim that it is fair use to make copies of every copyrighted work in even one major library, let alone three of them, is completely unprecedented in scale; it is tantamount to saying that Google can make copies of every copyrighted work ever published, period.
It is not unprecedented at all. It is exactly what search engines have been doing over the past ten years, since they started crawling the web. They are making copies of copyrighted works all the time, billions and billions of them.
When a search engine indexes a web page, it makes a copy of that page. Furthermore, all publications (at least in the US) are protected by copyright, regardless of whether that copyright is formally registered. Registration just provides further legal protection and redress in case of infringement. The fact that a work isn't formally registered doesn't mean it's a free-for-all for anyone to use.
When search engines index content, they do not formally request permission to do such copying. They just do it. Don't want to be copied? Then you have to stick up a robots.txt file or use the meta robots tag to opt-out.
If you don't opt-out, is that tantamount to granting permission? We don't know. The Bidder's Edge case didn't really answer it. Rather than copyright being the issue, it was found to be one of trespass.
The case involving image indexing between Les Kelly and Arribasoft cuts closer to this. When I spoke with Kelly about his case years ago, he didn't feel he should be required to opt-out, though he did try to. A court later found that there were fair use elements involved with showing thumbnails of these images.
The association's letter highlights this case in its argument against what Google is doing:
The single case you have cited to support Google's fair use claim, Kelly v Arriba Soft, has a pattern of facts substantially different from those in Google Print for Libraries. Among many other important differences, Arriba Soft was making copies of images that had already been digitized and posted on the web by their copyright owners. Google is presuming the authority to digitize many works whose copyright owners have not taken that step, and given the ease with which digital files can be duplicated and further transmitted, may have good reason for deciding not to do so.
Additionally, the full resolution copies Arriba Soft made in order to create the low-resolution thumbnails were deleted from Arriba Soft's server after the thumbnails were made. Google claims the right to retain the digital copies it makes -- the full resolution copies, if you will -- even in those cases when a publisher asks them not to display any text from particular works.
It's a bad argument. They are suggesting that the act of publishing on the web, which by its nature requires digitization, somehow may imply that copyright issues are somehow less valid.
They aren't. If it's a copyright violation to copy a print book, in order to index it and show summaries of what's contained, then it is going to be a copyright violation to index a web page, index it and show copies of what's shown.
In fact, Google, Yahoo and MSN go even further than this by providing cached copies of pages, another possible copyright violation explored in this News.com article from 2003. All do provide an opt-out of caching, of course. But again, it requires the author to explicitly take away permission, rather than the search engine first asking for it.
When I've written on such issues in the past, my own view as been that ultimately, a court will likely rule the value of web search combined with opting-out does fall on the fair use side. In other words, they aren't going to require that permission be sought before indexing happens. You don't want to be in? It's easy to opt-out.
The Google Print project could change that, however. Should publishers win a ruling that opt-out is not allowed, online publishers might insist that they are entitled to the same rights.
Want to comment? Visit our forum thread, SEW should support the AAUP's position on Google.
Postscript: Scholarly journals' premier status diluted by Web from the Wall St. Journal looks at how scholarly journals are under threat by demands they should be open to everyone.
Posted by Danny Sullivan at 9:37 AM | Permalink
When I first learned about Google's plan to digitize the full text holdings of several large libraries one of the first things that came to mind was the many copyright issues that would come from the publishing community. Well, here they come.
In March, the Harvard Crimson ran a story about copyright issues and the Google Library program
Today, two more articles about concerns coming from the publishing community over the program have been published.
The Chronicle of Higher Education and Business Week have articles about a recent letter sent by The Association of American University Presses (about 125 scholarly publishers) saying that the Google Library program, "appears to be built on a fundamental violation of the copyright act." You can find the full text of the letter here. AAUP is requesting that Google respond to the letter by June 20th.
What seems to be of most concern to the publishers is the millions of books that Google plans to scan from library collections which are still in copyright.
Let's review what Google Print/Google Library consists of.
+ Google is working with publishers to digitize new material that is shared directly from the publishers. Material will be full text searchable but each publisher determines how much of the book you can view during a visit.
Material will not be printable (yes, you could do screen caps for each page). All of this is very similar to what Amazon.com is doing with their Search Inside the Book program. All book entries will have direct links to online book merchants allowing users to purchase the full text.
+ Google Library Program Launched in December, Google's library program plans to digitize every book, both copywritten and public domain material held in several major library collections, and make is searchable via Google Print.
Several of the libraries that have joined the program are testing the program before committing to a full digitization. Regardless, if all goes as planned, scanning this massive amount of material will take many years.
What I think is most noteworthy is that library materials that are not out of copyright will be full text searchable online but not full text viewable. The searcher will only see a few sentences of text around the search term along with bibliographic info and links to purchase the actual book. In some cases, you'll also see a link to access the book from a library. In this day and age, will people be willing to wait to get the book(s) via interlibrary loan (if a local library doesn't have it) and more importantly, can libraries afford a large increase in the number of inter-library requests? Of course, also be in Google's future plans to offer the downloadable full text of any book online.
If the library content is in the public domain then the full text will be viewable online. The dates that Google is using to determine public domain material vary. From the Google Library FAQ:
If you're in the US, we've taken a very conservative stance and only books pre-1923 will be considered public domain. If you're not in the US, only books pre-1900s can be considered public domain because of differing copyright laws internationally.So, that's the gist of it.
The Association of University Publishers says that simply scanning the copywritten material might be a copyright violation.
"Copyright means the right to make copies, period," said Peter Givler, the university-press group's executive director, in an interview. "Copyright law can seem pretty byzantine and technical and elaborate and complicated," said Mr. Givler, who wrote the letter, "but at its simplest, that's what it is. It's the right to make copies..."It's just a gigantic claim on its surface," Mr. Givler said in the interview. "There are just a lot of questions that need to be answered."
Google has also heard from UK's, Publishers Association, It sent a letter to Google in February that touches on many of the same points that the AAUP letter discusses. The Chronicle of Higher Education reports that, "Google's answers thus far have not been reassuring." Business Week notes that John Wiley & Sons and Random House have also contacted Google about the library program.
Finally, comments from Lawrence Lessig at Stanford,
"For registered works it can be up to $150,000 per infringement," says Lessig. "I don't think any judge would do that because Google seems to be operating in good faith...but there's a huge exposure."As I've pointed out many times, Google Print is hardly the only service making full text available. This post links to several of these other services. I also included a few in my original article about Google Library.
Posted by Gary Price at 12:55 PM | Permalink
Yahoo has partnered with Creative Commons and released a new resource that restricts a web search (using the Yahoo web index) to content available with a Creative Commons license. You're also able to limit a query to find content available for commercial purposes and/or content that can modified, adapted, or built upon. Most of Yahoo's advanced search syntax appears to work. More in this post.
According to a Yahoo spokesperson, this is the largest one-stop search tool on the web to find material with a Creative Commons license on the web.
Btw, the Creative Commons site also offers a search engine that's powered by Nutch.org open source technology.
Any content creator (photographer, musician, blogger, etc.) can share some or all of their work with others using a Creative Content license. Think of it as a more flexible type of copright license. 11 types of CC licenses are available. If you're not familiar with how it all works, this page provides a great overview.
From the Yahoo news release: The launch of Yahoo! Search for Creative Commons is an important step in a broader movement to enable people to find, share and expand content within a new, more flexible set of copyright laws that ultimately enable the creation of a ?remix? culture and new generation of creative works."
Postscript: Chris wrote about the Creative Commons a couple of weeks ago, describing the mission of the organization and how to find content using its open-source search engine in Finding Free Content in the Creative Commons.
Posted by Gary Price at 1:01 AM | Permalink
Harvard-Google Project Faces Copyright Woes from the Harvard Crimson looks at copyright issues being raised about Google's plans to digitize thousands of books in the Harvard University Library. Points of views from both sides, though I have to say this quote from a past library directory made my eyebrows raise: "Copyright laws are written for companies like Time Warner and Disney instead of research libraries like Harvard. [These laws are] not aimed at us." Actually, I thought copyright laws were written to protect the rights of publishers from anyone, be it Time Warner, Harvard or a public corporation like Google.
Posted by Danny Sullivan at 3:34 PM | Permalink
Google News Says Au Revoir to Agence France Press ContentLess than a week after Agence France Press (AFP) filed a lawsuit against Google alleging copyright infringement of its content by Google, the folks in Mountain View will no longer index AFP material and remove old AFP content from the Google News index. The eWeek article: Google to Drop AFP from News Index, provides more details. If you would like to read the complaint filed with the U.S. District Court in DC, we've posted it here.
Posted by Gary Price at 12:17 PM | Permalink
Last Friday, news broke of Agence France Press filing suit against Google in U.S.District Court alleging copyright infringement. Here's the SEW Blog post with links to a Reuters and AFP's own story about the lawsuit.
If you're interested in reading the actual court filings (to this point), here's the full text of AFP's complaint (filed 3/17/2005) along with the 5 exhibits referenced in the document. All documents are PDF files.
Main Document (Complaint) 19 pages Exhibit A1 12 pages Exhibit A2 8 pages Exhibit A3 9 pages Exhibit 4 10 pages Exhibit B 6 pages
Posted by Gary Price at 5:54 PM | Permalink
Yes, it's another lawsuit that the Google's lawyers will need to handle. This one was filed by Agence France Press (AFP) (a global news agency that supplies material to many news sites) in U.S. District Court on Thursday.
AFP is suing Google for "at least $17.5 million" and "an order barring Google News from displaying AFP photographs, news headlines or story leads..." A Reuters article also says that AFP has asked Google to "cease and desist" from using its content but "Google has ignored such requests and as of the filing date of the lawsuit 'continues in an unabated manner to violate AFP's copyrights.'"
More in the articles: + Agence France Presse sues Google over news site from Reuters + Here's how AFP is covering the story via their approved feed from Yahoo News.
Posted by Gary Price at 11:26 AM | Permalink
Upset about Google AutoLink, the new Google Toolbar feature that adds links to web pages that it feels are appropriate? You might try a new tool created by Mark Pilgrim that inserts links on Google's own pages (NOTE: Updated below with comments from Mark Pilgrim). Via Boing Boing, news of his new Butler Firefox extension that among other things:
For example, in a search for cars, Butler inserts this at the top of the Google search results:
★ Try your search on Yahoo, Ask Jeeves, AlltheWeb, Teoma, MSN, Lycos, Technorati, Feedster, Daypop, Bloglines
And below news results listed, it says:
★ Find more news at Yahoo News, Ask Jeeves, AllTheWeb, MSN, Lycos, Technorati, Feedster, Daypop, Bloglines
To use it, you need to have the Greasemonkey Firefox extension. Once that's installed, you can then go back and install the Butler extension. Once activated, it can be disabled without actually having to uninstall it, should you want to play with the tool from time to time.
The usefulness of the tool is clear. It's very handy for the searcher to have. Given this, it would be hard for Google to object to the tool especially after Google's statement in my Google Toolbar's AutoLink & The Need For Opt-Out article on how they'd react to tools that added links or perhaps stripped ads from their search results:
"I think we'd need to look overall at the utility offered to the users. Can a good argument be made that those users understand what's going on?" said Marissa Mayer, Google's director of consumer web products at Google. "It would be hard for us to argue against user utility because those are the same metrics we're going to use in evaluating our feature set."
In that article, I wrote my view that when trying to balance desire of users and rights of publishers, tools that added links to pages went too far if they didn't provide a publisher opt-out. And that's main main issue with Butler. While it's giving Google a taste of its own medicine, by rights, it should be letting publishers also opt-out of having links added. And that means Google as a publisher should get that right to opt-out of Butler.
Will an opt-out be added? Would that be added if Google did the same for AutoLink? Pilgrim actually responds that his creation wasn't made as a way of pushing back at AutoLink. He emailed me:
I couldn't care less about the AutoLink hoopla, except that it gave me the idea for Butler. I think anything running on my computer should be under my complete control. I say this as someone who publishes content for money (although it's not my primary income).
Look, I run ZoneAlarm Pro with highest sensitivity and all advanced options enabled (including popup blocking). I run Proxomitron on top of that, and AdBlock and FlashBlock on top of that. These tools don't block ads by accident; they come pre-configured with specific knowledge of specific ad servers. Butler is just another ad blocker.
As for the "try your search on" feature, I am old enough to remember that Google used to offer this feature themselves. Back then it was "try your search on Altavista, Hotbot, Lycos, Excite, etc." All the popular search engines of the day. The point is, linking to competitors makes Google more valuable, not less. They seem to have changed their attitude about that as they've added more and more services of their own.
Google as a whole is becoming more and more of a walled garden, which is ironic, given that they started out in the business of sending people away. Now they take every opportunity to keep you within their walls. This might sound like a good idea in a Powerpoint slide deck, but it will kill them in the long run.
None of this answers your question about why I wrote it. Honestly, I wanted to teach myself Javascript and DOM scripting. I'm a geek, not an activist. I spend a lot of time using Google's services, and with the AutoLink faux-crisis still brewing, it seemed like an obvious choice of project.
As for a Google comment on the new tool, I've got a question in to them. In the meantime, some related reading:
My own view is that trying to come up with some type of universal guidelines for content modification tools isn't going to be successful. I think there's going to be a variety of lines that we draw over time, and those lines might even change over time. But for me, right now, adding links is a clear and simple line we can start with. If you make a tool that adds links to a page, you should give the publisher an ability to override that feature.
How could opt-out be done? SearchGuild -- which published the first widely-cited AutoLink killer -- is pushing a meta tag. No tool uses this tag right now, but they could. I'll expect to add the tags to Search Engine Watch soon just to show my support. More about the tag here: JavaScript to Kill Google Autolink.
All-in-all, Butler is just the latest example of the "mess" AutoLink created when it was released, as I wrote earlier. It came out, then we got an AutoLink killing script, a supposed way to kill that script, now a tool some will use to fight back at Google plus heaps of bad PR for Google continuing.
Two years ago, the company pulled the related searches feature that its own AdSense publishers hated within 48 hours. We don't need months more of testing AutoLink for Google to realize it needs to make some significant changes to please publishers and not just the usual noises of always considering feedback. Let's get on with an actual solution, starting with an immediate opt-out.
Posted by Danny Sullivan at 9:37 AM | Permalink
SiliconValley.com points to a transcript of keynote speech by Vanity Fair columnist Michael Wolff, where he tells a story of Google founders Larry Page and Sergey Brin discussing the coolness of owning your own 767 while traveling on the private jet of an unnamed billionaire. The duo also discuss the idea of Google branching off into making underwear and bras. OK then.
If that's not strange enough, you'll note that before you get to the actual transcript, the site hosting it is printing correspondence telling it to remove the transcript because of copyright issues. The site obviously disagrees.
Meanwhile, a search for wolff google shows how a similar DMCA complaint has probably gotten that same listing taken out of Google. Note the bottom of the page:
In response to a complaint we received under the Digital Millennium Copyright Act, we have removed 1 result(s) from this page. If you wish, you may read the DMCA complaint for these removed results.
The complaint, by the way, has yet to actually be posted at Chilling Effects. For more on how DMCA takedown disclosure works, see my Spam Rules Require Effective Spam Police article.
Postscript: Some background on the transcript removal is here: Michael Wolff's speech.
Posted by Danny Sullivan at 12:01 PM | Permalink
I've got an update coming on developments with Google AutoLink since I last wrote about it (see Google Toolbar's AutoLink & The Need For Opt-Out). There's a petition, meta tags against it and so on. But in the meantime, via Steve Rubel, news that Wall St. Journal tech columnist Walt Mossberg has come out against the system: Google Toolbar Inserts Links in Others' Sites, And That's a Bad Idea. Mossberg was instrumental in getting Microsoft's Smart Tags killed, as my earlier article explains more.
Posted by Danny Sullivan at 10:03 AM | Permalink
AutoLink is new feature in the new third version of Google's popular Google Toolbar that's raised controversy since it was released last week. Why are publishers upset? Can they block the feature that adds links to their web pages? Who rules over content, users or publishers? Why do I think Google should give publishers an opt-out for the feature. That, and other issues, we'll explore in this article. It's a long one, so the links below will let you jump to particular sections, if you prefer.
Let's start by revisiting how the feature works. It's only available to those using the Google Toolbar 3 beta. Existing Google Toolbar users have not automatically had this feature added, so the number of people currently AutoLink-enabled is small. It will grow, of course, when the toolbar comes out of beta and takes over as the main one offered to the public, something likely to happen in the next few weeks.
Currently, AutoLink only reacts if it spots four types of information on a page:
Below, I've inserted two examples in the article so that anyone with the AutoLink-enabled toolbar can see autolinking for themselves easily. The first is the book Web Search Garage by Tara Calishain with its ISBN number shown. The second is Google's address:
Web Search Garage Prentice-Hall, August 2004 ISBN 0131471481, $19.99
Google Headquarters 1600 Amphitheatre Parkway Mountain View, CA 94043
If you have the AutoLink-version of the Google Toolbar installed and come to a page like this one with such "trigger" content on it, you'd hear a little "popping" sound familiar to anyone who uses the Google Toolbar currently, when it blocks a pop-up window from opening.
The AutoLink button in the toolbar also lights up or goes active, changing from "Not Active" to "Active" as shown in the illustration below:
When active, you can push directly on the button or use the little drop-down arrow next to it to get a menu, as shown with the "Drop Down Box" example.
Whether you push directly on the button or use the drop-down option, in both cases, links are also added to the page, making them look like this:
Web Search Garage Prentice-Hall, August 2004 ISBN 0131471481, $19.99
Google Headquarters 1600 Amphitheatre Parkway Mountain View, CA 94043
Click on the ISBN link, and you'll be routed via Google over to a page about the book at Amazon. Click on the address, and you'll be routed to that address shown in Google Maps.
Alternatively, use the drop-down box, select an option shown, and an entirely new window will open to display the AutoLink content. In contrast, with the links on the page, new windows aren't opened. Instead, the original window is replaced with the new content.
Don't like the links? Via the drop down box, you can use the Remove option to get rid of them or put them back using the Add option, if they have been removed.
By the way, earlier this week I found that using the drop-down box did NOT add links to the page. In fact, because I was using the drop-down box rather than pushing on the button, I at first didn't think links were actually added to the page at all. I talked with one other person who had the same thing happen to her. But in writing this article, that behavior changed for me.
Google says it's made no alteration to the toolbar behavior since it launched. Nothing has been changed on their end, the company says, and I should have always been seeing links added to a page whether I pushed directly on the button or chose the drop-down option. Given this -- and how corroded my IE installation has become over the past year or so (one reason I now use Firefox), I'll chalk it up to an oddity on my end.
Google says feedback from users so far is that they like the feature. That's easy to see why. If you come across a page about a book without a link, as I showed above, it's very nice that you can get to another page with more information about it or the ability to buy it. Amazon fills that role nicely. I've often come across books mentioned on pages, then had to do the copy-and-paste routine over at Amazon in the way AutoLink helps make unnecessary.
Similarly, if you see an address such as on a corporate web site and would like to get a map, this is a handy way not to have to cut-and-paste into a mapping program.
Fair to say, feedback so far from publishers isn't so rosy. Yes, some think the feature is nice, such as prominent blogger Anil Dash has said. But from my review, he's in the minority. We've had other prominent bloggers such as Steve Rubel, Dan Gillmor and Dave Winer crying foul.
Closer to home for me, many search marketers who are also publishers clearly dislike the tool. At our Search Engine Watch Forums, the AutoLink & Google As Anti-Webmaster thread isn't finding many people in favor of it. The same is true for the New Google Toolbar Feature Rekindles the Old SmartTag Debate thread at WebmasterWorld.
Publishers do get a benefit from the tool. If they've failed to add useful links, those visiting their sites perhaps may come away happier that they were still able to leverage the information on the pages to get further information.
The publisher fear is far larger. Many publishers consciously decide what links they want to add. Having some tool come along and modify their content is simply unacceptable to them. That's especially so given how easy it would be for any tool to grow capabilities, such as making words into ad links that generate no revenue for them -- something that's happened in the past.
There is a ton of hue and cry about how Google is trying to repeat a plan Microsoft abandoned after large outcry in 2001 called Smart Tags, which would have allowed words on pages to be turned into links. Which links and to where? That would have been determined by Microsoft.
By the way, a key developer of Smart Tags from Microsoft does now work for Google. However, rumors that he was involved with Google AutoLink aren't true. Google says he's involved in a completely different product.
Microsoft backed off from Smart Tags, but TopText from eZula went ahead later that year. It inserted yellow hyperlinks into pages -- paid links that earned eZula money but not the publisher. My Forget Smart Tags; TopText Is Doing What You Feared article from back then looks in depth at the system and the concern that arose over it. I'd strongly encourage reading it, because there are plenty of direct comparisons between what happened then and what's happening now.
eZula's still out there and apparently offering the same type of placement, but my impression is that the system didn't gain greater popularity due to search marketers who especially rallied around the late Jim Wilson's Scumware site to fight the program.
Why did search marketers care so much? They were footing the bill. Ads they placed with people like LookSmart got inserted into pages that they never actively chose. Many disliked this and made threads to their ad providers like LookSmart to stop partnering or lose them as customers.
Predating both the Smart Tags idea and TopText was Amazon's zBubbles and Flyswat, both from 1999. They came and went without any major outcry. Flyswat in particular inserted links on pages just as TopText did, Smart Tags would have and AutoLink now does.
I see now that some places like Symantec now class Flyswat as spyware, which sort of amazes me given that I thought the product long ago had died. I can't even reach the Flyswat site, but I suspect old installation copies are still floating around via download sites such as PC World (which offers it here, then offers an anti-spyware tool to get rid of it here). But at the time it was out there, Flyswat drew praise in many quarters as a great browser "helper."
Why was Flyswat largely acceptable, when only two years later, Smart Tags and TopText drew ire and today, Google AutoLink faces criticism?
With TopText, the answer is easy. Publishers didn't like the fact the system let competitors manage to insert themselves into their own content. Others who had purchased precisely targeted search ads weren't happy to discover that these ads were then in turn distributed to TopText for less precise contextual targeting.
With Smart Tags, it was the monopoly factor. Microsoft had such a dominant share of the browser market that letting it control how words would be linked was simply too frightening to many -- and this despite opt-outs the company decided just before the end that it would offer.
Enter Google. It, too, occupies a dominant role. We don't know exactly how many toolbar installations it has, but the company acknowledges millions of users. To be fair, Marissa Mayer, Google's director of consumer web products, told me that queries generated through the Google Toolbar are "by no means a majority of all Internet Explorer users" who access Google.
"With AutoLink versus Smart Tags, the toolbar is different is that its only installed by users [as opposed to automatically being part of the browser] and is by no means a majority," she explained further.
Even Microsoft blogvangelist Robert Scoble agrees here, arguing that Google can do things Microsoft can't because Microsoft still has a browser on 9 out of 10 desktops out there. Nevertheless, he was against Smart Tags and doesn't seem to favor the current Google implementation of AutoLink.
Monopoly or not, the toolbar clearly has many users. In addition, people like Winer fear that if Google is able to offer this type of feature, nothing prevents Microsoft and others from doing the same.
So with Google, there's a bit of the monopoly factor. I think there's also the TopText-like fear that AutoLinks could cost publishers money. If you have a page about a book, you might not want Google sending someone to Amazon to purchase it, especially without your own affiliate code.
As an aside, it's worth mentioning that there are other reasons why you might find advertising links inserted into editorial copy. Vibrant Media's been doing this for some time through its IntelliTXT service. However, the issue of publisher rights as with Google AutoLinks is not in question with this type of service. That's because the publisher themselves has chosen to add the links.
Instead, the issues are more about the practice from an editorial integrity standpoint, and yesterday's Ads Embedded in Online News Raise Questions article from the New York Times is just one of many articles to look at this.
Back to Google AutoLink, a remaining major concern for publishers is simply that they might not want Google sending anyone anywhere out of their sites via links that they didn't provide in the first place. There's a potential traffic loss people worry about, though Google doesn't see this as a serious problem.
"Are we really taking traffic away from them? Think about what they've [users] have done. They've been looking at the page. They've decided there's a piece of information on the page. They had to get the idea that they wanted to get more information some way. They clicked a toolbar button, and then they clicked a link. That's a pretty determined series of user actions. It seems to me that that user is going elsewhere anyway," Mayer said.
What about the idea that Google might put ads links on pages? That's not something it does now, nor does the company have any plans to in the immediate future, it said.
As for those Amazon links, Google said it gains nothing from them. Amazon was selected because it was seen as the best choice for book information.
"Obviously Amazon is a partner of ours, but there was no monetary exchanges as part of this development. We picked out what we thought was the best user experience for things we linked to," Mayer said.
Don't like that choice? When the tool emerges from beta in the near future, it is definitely planned for people to choose some of the content providers they want to tap into. If you want links to Barnes & Noble for ISBNs rather than Amazon, you'll almost certainly be able to do that or pick from others.
How about the tool expanding the range of what's auto-linked. That could happen. Google's not saying what may or may not change, because the tool is still in beta -- a traditional style beta that should only last a few months at most.
It's possible, Google said, that if users push the button, it might decide that the toolbar should always automatically show links rather than make this a page-by-page choice users initiate. Or not, depending on feedback.
New features could also be added or removed. The company is interested in link enabling anything that someone might have to cut-and-paste to get existing information from Google. For instance, enter a stock symbol into Google right now, and it links to you stock data. Potentially, stock symbols could be turned into AutoLinks.
Couldn't any word be made into a link? Sure, but that would be too much, Google says.
"That goes a little too far. We aren't interested in turning an entire page into hyperlinks. That's not particularly helpful to the user," Mayer said.
AutoLink also raises anew the philosophical debate of who ultimately controls content. "It's my content, hands off!," is a common theme that resonates with many publishers. What gives Google the right to start tampering with your page?
Google's response is that the users give them the right. The users want this tool. The users want to control how they view that content.
"It's important to recognize that the toolbar is installed by people who want Google-enhanced functionality," Mayer said. "I would argue that the user is adding the link to the page. Google just provides the tool."
That's a pretty forceful argument. We don't hear many objections to the fact that users can control font sizes as they like, for example. Google's open source program manager Chris DiBona goes through a litany of more things like this in his personal blog post on the issue, Oh, please.
It's easy to add more. I've heard plenty of praise for various Firefox browser plug-ins that can do special things to pages when they spot certain types of links or the ability to restyle entire pages with Firefox. Why is Firefox so praised for enabling users but Google suddenly seen as evil for doing the same?
Indeed, this isn't the first time Google has interacted with publisher content via its toolbar before. The ability to highlight or jump to words on a page are widely praised. But more dramatic was the addition of a pop-up blocker in June 2003. That not only prevented some web sites from doing what they wanted to do, but it also arguably cost some publishers money through the blocking.
Wide-spread criticism? Hardly. I've seen a few grumblings from time-to-time that Google might be blocking commerce and publisher intent this way, but the praise over the pop-up blocking feature has been enormous -- and mimicked by other search toolbars. My guess is that publishers didn't fight back more against this because it was clear how hated pop-ups where by consumers.
So where is that line when a tool gives a user too much control -- or better, when a user is given control that a publisher ought to be able to counter? I agree with many others that adding links crosses it. I don't care if the user thinks adding links to my pages will make things better for them. As a publisher, I want to be able to override a tool that tries this.
Legally, we don't know where publishers really stand on this, as the recent Google toolbar move raises online ire from News.com examines. But forget legal.
Instead, adding links is a line that I think any respectable software publisher shouldn't cross. Last year, Google introduced a set of software principles that are all about protecting the user experience. An addition to those principles should be made to protect the publisher experience, as well.
In this case, I think Google should provide an easy opt-out that publishers can implement to block AutoLink. Some others want AutoLink to be opt-in -- that Google shouldn't be able to do anything like this unless publishers explicitly say they should.
I think that's too far. Users do have rights. They have installed this software. Opt-out gives any publisher seriously concerned with the tool the ability to control it on their site. Many won't be concerned, so requiring an opt-in is overkill that does hurt the user experience.
It's also somewhat hypocritical to demand Google do an opt-in for this tool when virtually no one demands an opt-in about being crawled. Why that isn't demanded is pretty clear. People want in Google because of the traffic it will bring them. But being crawled is another form of messing with content.
For its part, Google doesn't want to do an opt-out. The fear is that it will hurt the user experience.
"If you had opt-in or opt-out, that's overall a lot less useful," Mayer said. "If the links sometimes won't show because there's a publisher opting-out, that's bad for the user experience."
Explaining further, she said:
"It's an interesting balance to strike, but we're going to weigh more heavily on the user side," Mayer said. "We think we struck the initial balance in a reasonable way. The publisher's page is seen as intended in the browser. It's a user-elected action that changes things. Beyond that, we aren't driving all traffic to Google."
Google also feels there's a form of an opt-out in that it won't overwrite any existing links. Worried that an ISBN code might get turned into a link by Google? Make it a link yourself, and it will be untouched.
Indeed, when Gary Price first wrote about the AutoLink feature in Search Engine Watch last week, he used an example of going to Barnes & Noble to show how unlinked ISBN codes there got auto-linked through the Google Toolbar to connect people to Amazon.
That made Barnes & Noble into a poster child for many publishers about why AutoLink was bad. Look at how it put links to a competitor on the Barnes & Noble site!
It took the company about a week, but an opt-out is effectively in place with Barnes & Noble. As I wrote yesterday, all ISBN numbers on the site now have links to Barnes & Noble's own content.
It was probably an easy move for them to make, having a database-driven site. But for others, it could involve a lot of hard-coding. In addition, if Google adds new content types for AutoLink, then publishers have to go back and make more changes. Adding your own links to block Google AutoLinks is simply not an effective form of opting-out for many to use.
My response to the "protect the user experience" argument is pretty blunt. Too bad if it is harmed in this case, from Google's perspective.
They may be Google's users, but they are also my users as a publisher as well. If my visitors are upset that my site prevents them from using Google AutoLink, they can tell and lobby me directly. I don't need Google deciding for me what my users want on my web site.
Google would gain on the public relations front from offering an opt-out. Even better, I'd encourage them to lobby for a single standard type of opt-out that other publishers could support such as through a robots.txt file extension that works for everyone. That would be real leadership in the industry and in line with the software principles statement it started last year.
How about turning the tables? How would Google feel about programs that modified its search results. It's not even theoretical. We have tools that will strip out ads from Google because the user may not want ads. We have software that will add links to Google's own results (for more, see our forum thread).
"I think we'd need to look overall at the utility offered to the users. Can a good argument be made that those users understand what's going on?" Mayer said. "It would be hard for us to argue against user utility because those are the same metrics we're going to use in evaluating our feature set."
It's a change from when Google was asked about this in 2001, on what it thought of TopText adding links to its results. At that time, it wasn't an issue of it being OK if it helped the user. Instead, the Google wasn't concerned because there didn't appear to be much take up of TopText.
Still, things change -- and it's helpful to have a current view on where Google stands, especially if a competitor like Yahoo or Microsoft decides to add a feature to its toolbar that allows users to hit links inserted on Google pages to generate results from their search engines.
I'd sweeten the pot a bit to encourage Google to give an opt-out. Personally, I only want it to prevent adding links to my pages. Want to display links via the toolbar? That's fine -- it's your toolbar, do what you want with it.
Wouldn't that mean Google might down the line start showing ads or content related to my pages in the toolbar. Yes, it might. But we've had tools do this sort of thing already (a new toolbar program from Searchfeed and EffectiveBrand just came out this week), plus free useful tools do need to be supported somehow.
I wouldn't necessarily like it, but if it's not interfering with my actual page -- popping things over my content, adding links but instead staying within the toolbar area, I'd live with it.
That's especially so as long as the user clearly knew what was happening in the toolbar. All the same arguments Google makes about the user having the right to do what they want, I heard the same from TopText way back when. But Google says its history of user disclosure on what the toolbar does is better, and I largely agree.
"You can just look at Google's track record as with the PageRank feature. We tell people it's not the 'usual yada yada' and we are very up front," Mayer said. "We make sure our users are really informed that something going to happen, because we want to have the trust of our users."
In other words, no one gets tricked into downloading the Google Toolbar. And the links aren't automatically enabled. You do have to make the choice to turn them on.
Nevertheless, I still don't want links added to my pages. But if someone wants to consciously choose to click on a button that makes new windows pop-open, it's hard to object.
Similarly, we have a long history of other tools being tolerated for showing related content, such as Alexa. Heck, for ages both Internet Explorer and Netscape had built-in "related links" functionality powered by Alexa that few ever objected to.
Another option for Google is to provide Alt-Click functionality in the way that the GuruNet helper application (now Answers.com, also once called Atomica) has long allowed. In this case, people can select a word, hold the ALT key and click with their mouse, which in turn brings up a page with more information about what's described.
This doesn't add anything to a web page, easing concerns about content manipulation. Indeed, Wall St. Journal writer Walt Mossberg, who rallied against Smart Tags in 2001, nevertheless loved GuruNet for letting him Alt-Click on words in his same complaint against Smart Tags and has continued to praise the GuruNet's Alt-Click feature in 2003 and 2005.
In short, Alt-Click is an easy way to provide the user who wants to make a conscious choice to act upon ISBN numbers, addresses or other content that lacks links with AutoLink-like functionality -- just without having to use the actual links that are objectionable to some publishers.
Google did consider this option, but links were seen as more intuitive:
"We talked about whether we should make this work like that or something else. But we think that if you're going to create a link, the ability to get to get to another page, the web already has paradigm for that. Right now, the link really does make sense," Mayer said.
Adding further, she said:
"The links that we add do look different. We work hard to help the user understand that this was a link added by the Google Toolbar, that it wasn't a native link. We do this through a mouse rollover that is visible when you mouse over the link."
From my end, the mouse rollover isn't enough, little Google color "bubbles" or "balls" added to the hand icon, along with link pop-up text that says "Google Toolbar AutoLink." That's because before you hover, these links look identical to native links -- and some people are just going to click rather than hover for very long.
A different color or a double-underline or something would help. But while I certainly agree that links are far more intuitive, whether they look radically different from native links or not, they simply clash too much with publisher rights, in my view, and at this moment.
You don't have to wait for Google to provide an opt-out, especially in that it might never do so. Threadwatch describes a JavaScript blocking solution cooked up by Search Guild. Download the solution (instructions are provided), insert it into your web pages. The same Threadwatch thread is also tracking any new solutions that come up -- some new server-side ones have just been posted.
Meanwhile, an anti-anti-AutoLink option appears to also be out there for users who want to override publishers trying to prevent AutoLink. I say appears because it seems like a clunky workaround that I can't really understand -- and looking at the comments posted, some others don't get it as well.
I mention it mainly because it highlights how quickly things have become absurd. You have third-parties working to prevent AutoLink and potentially others working to prevent preventing AutoLink. It's a mess.
The user experience is hardly being protected by Google refusing to provide an opt-out. It would be much better for Google to provide an opt-out in a way that makes publishers happy but also lets Google report clearly to its own users if the publisher has blocked AutoLink from the site they are visiting.
After all, it's arguably bad for the user experience if they can't get cached copies of pages. Nevertheless, Google has long allowed web site owners the ability to opt-out of having pages cached, primarily it seems to avoid conflicts over copyright. Despite this opt-out, the cached pages feature has survived for years. AutoLink can survive opt-out black spots, as well.
Finally, just weeks ago, Google acknowledged that publishers should have MORE ability to control their links through the introduction of the nofollow link attribute. It's disconcerting to say the least to then have the same company assume a right to add links to publisher pages without permission.
Posted by Danny Sullivan at 10:43 AM | Permalink