Several mostly print publisher groups say they are to test a new "Automated Content Access Protocol" that they feel will head off conflicts with search engines. A release with more information is below.
Exactly how the system will work, why it is different or better than existing systems like robots.txt or meta robots tags, isn't explained. More details are promised to be unveiled at the Frankfurt Book Fair on October 6.
I'm planning to talk with the World Association Of Newspapers to learn more about their plans next week, so I may have more before the formal unveiling. I've had a very informal talk already, and the view seems to be to find a way to make the existing systems work better. That's appreciated, and it's something the search marketing community has long wanted. But it's something I hope will involve more than just a group of publishers with mostly print interests.
My Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers article from earlier this week explains how in my view, the entire issue that has erupted in Belgium is less about keeping content out of search engines and more about trying to force them to pay publishers for inclusion. Right now, any publisher that feels copyright is somehow infringed by being in a search engine has a very easy, very selectable way to keep whatever they want out: robots.txt files or meta robots tags. These work on a web-wide basis, have support of all the major search engines, plus have been used by users from publishers of all types. They could definitely be improved -- but in the Belgium case in particular, using them would have solved the exact problem that was raised.
Here's the release:
GLOBAL PUBLISHERS HEAD OFF LEGAL CLASH WITH SEARCH ENGINES: NEW RIGHTS MANAGEMENT PILOT IMMINENT
In the week that the publishers of Le Soir and La Libre Belgique won their case in the Belgian Courts against Google for illegally publishing content on its news service without prior consent, the World Association of Newspapers (W.A.N.), the European Publishers Council (E.P.C.) the International Publishers Association (I.P.A.) and the European Newspapers Association (E.N.P.A), are preparing to launch a global industry pilot project that aims to avoid any future clash between search engines and newspaper, periodical, magazine and book publishers.
The new project, ACAP (Automated Content Access Protocol), is an automated enabling system by which the providers of content published on the World Wide Web can systematically grant permissions information (relating to access and use of their content) in a form that can be readily recognised and interpreted by a search engine “crawler”, so that the search engine operator (and ultimately, any other user) is enabled systematically to comply with such a policy or licence. Effectively, ACAP will be a technical solutions framework that will allow publishers worldwide to express use policies in a language that the search engine's robot “spiders” can be taught to understand.
Gavin O'Reilly, Chairman of the W.A.N., said: “This system is intended to remove completely any rights conflicts between publishers and search engines. Via ACAP, we look forward to fostering mutually beneficial relationships between publishers of original content and the search engine operators, in which the interests of both parties can be properly balanced. Importantly, ACAP is an enabling solution that will ensure that published content will be accessible to all and will encourage publication of increasing amounts of high-value content online. This industry-wide initiative positively answers the growing frustration of publishers, who continue to invest heavily in generating content for online dissemination and use.”
Francisco Pinto BalsemÃ£o, Chairman of the E.P.C., said: “ACAP will unambiguously express our preferred rights and terms and conditions. In doing so, it will facilitate greater access to our published content, making it more, not less available, to anyone wishing to use it, whilst avoiding copyright infringement and protecting search engines from future litigation.”
ACAP will be presented in more detail at the forthcoming Frankfurt Book Fair on 6th October and will be launched officially by the end of the year. W.A.N., the E.P.C. and I.P.A. will run the pilot for a period of up to 12 months and it will be managed by Rightscom Ltd.
The European Publishers Council is a high level group of Chairmen and CEOs of European media corporations actively involved in multimedia markets spanning newspaper, magazine and online database publishers. Many EPC members also have significant interests in commercial television and radio.
The World Association of Newspapers groups 72 national newspaper associations, individual newspaper executives in 100 nations, 13 news agencies, and nine regional press organizations, representing .more than 18,000 publications in all international discussions on media issues, to defend both press freedom and the professional and business interests of the press. The International Publishers Association is a Non Governmental Organisation with consultative relations with the United Nations. Its constituency is of book and journal publishers world-wide, assembled into 78 publishers associations at national, regional and specialised level. The European Newspaper Publishers' Association – is a non-profit association currently representing 5 100 national, regional and local newspapers. These daily, weekly and Sunday titles are published in 24 European countries where ENPA's members are operating in their national markets.
Postscript: I've just received this briefing paper that explains more. I've skimmed it and attached one note marked in bold. Basically, the existing robots.txt or meta robots systems can do a lot of what's already described here. What they cannot do is help search engines access content because the publisher allows this only through a licensing agreement, something the Belgian publishers seem to want. In addition, the pilot can do all it wants. Unless some major search engines agree to cooperate, the pilot will go nowhere. Again, I'll follow up more on this next week after talking with the groups involved.
Automated Content Access Protocol
A briefing paper for publishers on a project in planning
1 Executive summary
All sectors of publishing face a “search engine dilemma”. The value of search engines to users – and to those who publish on the network – is incontrovertible. However, search engine activities can be very damaging to specific online publishing models. The undifferentiated model of permissions management (essentially either allowing or forbidding search of content) is inadequate to support the diverse present and future internet strategies and business models of online publishers.
At the beginning of 2006, the major publishing trade associations established a Working Party, chaired by Gavin O'Reilly, Chairman of the World Association of Newspapers, to consider the issues that this has raised. As a result, the World Association of Newspapers and the European Publishers Council are planning a project which will develop and pilot a technical framework which will allow publishers to express access and use policies in a language which the search engine's robot “spiders” can be taught to understand. This will make it possible to establish mutually beneficial business relationships between publishers and search engine operators, in which the interests of both parties can be properly balanced.
The project is provisionally called ACAP (for Automated Content Access Protocol). ACAP will develop and pilot a system by which the owners of content published on the World Wide Web can provide permissions information (relating to access and use of their content) in a form in which it can be recognised and where necessary interpreted by a search engine “crawler”, so that the search engine operator (and perhaps, ultimately, any other user) is enabled systematically to comply with such a policy or licence.
This paper is intended to brief publishers on the outline of this project and to encourage their active support and participation when the project is launched in September 2006.
2 Background – the “search engine” problem
At the beginning of 2006, the major Europe-based publishing trade associations – including the World Association of Newspapers (WAN); the European Publishers Council (EPC); the European Newspaper Publishers Association (ENPA); the International Publishers Association (IPA); the European Federation of Magazine Publishers FAEP); the Federation of European Publishers (FEP); the World Editors Forum (WEF); the International Federation of the Periodical Press (FIPP) and Agence France Presse – established a Working Party to consider the issues that are posed by search engines for publishers, and to look at ways in which mutually beneficial relationships can be established between publishers and search engine operators, in which the interests of both parties can be properly balanced.
All sectors of publishing have a “search engine dilemma” (even if we disregard the particular problems that book publishers have with mass digitisation programmes). Search engines are an unavoidable and valued port of call for anyone seeking an audience on the internet. Search engines sit between internet users and the content they are seeking out and have found brilliantly simple and effective ways to make money from that audience. They have become so dominant that no individual website owner is large enough to have any serious impact on their commercial fortunes.
The benefits of powerful search technology to both users and providers of content are well recognised by publishers – although even “mere” search functionality can have a negative impact on some publishing business models. At the same time, publishers are aware that search engines are, in following their business logic, inevitably and gradually moving into a publisher-like role, initially merely pointing, then caching and, finally, aggregating and “publishing” and perhaps even creating content themselves, while using publishers' content at will.
In the current state of technology, there can be none of the differentiation of terms of access and use which characterises copyright-based relationships in publishing environments, whether electronic or physical. The search engines can and do reasonably argue that, since their systems are completely automated, and they cannot possibly enter into and manage individual and different agreements with every website they encounter, there is no practical alternative to their current modus operandi.
Whether this (technological and political) gap is there by design or by accident, the search engines are able to make their own rules and decide for themselves whose interests are worth considering.
If publishers are to take the initiative in establishing orderly business relationships with the search engine operators, the response must be to help them to address the problem, both to fill the technical gap and ensure its political implementation. To paraphrase the former copyright adviser to the UK Publishers Association Charles Clark's famous claim that “the answer to the machine is in the machine”, the challenges that are created by technology are best resolved by technology. Since search engine operators rely on robotic “spiders” to manage their automated processes, publishers' web sites need to start speaking a language which the operators can teach their robots to understand. What is required is a standardised way of describing the permissions which apply to a website or webpage so that it can be decoded by a dumb machine without the help of an expensive lawyer.
In this way, one of the search engines' most reliable rationalisations of their “our way or no way” approach will have been removed, and a structure which embraces and supports the diverse present and future internet strategies and business models of online publishers will have been created.
As a result of the work of the Working Party, a proposal was made to develop a permissions based framework for online content. This would be a technical specification which would allow the publisher of a website or any piece of content to attach extra data which would specify what use by search engines was allowable for that piece of content or website. The aim will be for this to become a widely implemented standard, ultimately embedded into website and content creation software.
Following the commissioning of a brief feasibility study, WAN and EPC have taken the initiative to establish a project to develop and pilot this framework to express publishers' access and use policies. A detailed plan for this project – provisionally called ACAP (for Automated Content Access Protocol) – is currently in development.
This paper is intended to brief publishers on the outline of this project and to encourage their active support and participation when the project is launched in September 2006.
3 ACAP – the vision
ACAP will develop and pilot a system by which the owners of content published on the World Wide Web can provide permissions information (relating to access and use of their content) in a form in which it can be recognised and where necessary interpreted by a search engine “crawler”, so that the search engine operator (and perhaps, ultimately, any other user) is enabled systematically to comply with such a policy or licence. Permissions may be in the form of
• policy statements which require no formal agreement on the part of a user
• formal licences agreed between the content owner and the search engine operator.
There are two distinct levels of permissions which need to be managed within this framework:
• The permission given to the search engine operators for their own operations (access, copy and download, cache, index, make available for display)
• The delegation of rights given to the search engine operators to grant permissions of access and use to search engine users (search, access, view, copy, download, etc)
Although these can be managed within the same framework, it is important that the differences between them are recognised.
4 Use Cases
We include two informal Use Cases which are illustrative of the type of challenge that we seek to solve through ACAP.
4.1 USE CASE A: NEWSPAPERS
Newspaper publisher A would like all search engines to index his site, but only search engines X, Y and Z may display articles (because they have paid a royalty) on their news pages, and then only for 30 days. All images must be fully attributed as they are in the newspaper. The newspaper publisher uses articles syndicated by other newspapers and news agencies and cannot grant permission for those items, to the extent of the third party rights. Articles should not be permanently cached.
NOTE FROM DANNY: Using existing systems, publishers privileged enough to be included in news search engines don't have their articles displayed. They have links to those articles displayed, along with a description, something that people do all over the web and is generally accepted as fair use. Specific search engines can be blocked, if that's the desire. Specific images can also be blocked. Publishers can require those reprinting their content to install blocks as well.
4.2 USE CASE B: BOOKS
Book Publisher B invites search engine operators X, Y and Z to index the full text of his latest college text books. The web site where the full text is stored should not be made visible to search engine clients. He wishes that search engine users can browse only 2 pages of a maths book, but 20 pages of a philosophy text book. Search engine users should be able to buy individual chapters for private use, at $5 and $3 per chapter respectively.
5 Business requirements
Although it will be an integral part of the ACAP project to further develop and confirm the business requirements of publishers for the operation of the framework, significant progress has already been made in identifying the high level business requirements against which any technical solution must be measured. In summary, the solution must be:
• enabling not obstructive: facilitating normal business relationships, not interfering with them, while providing content owners with proper control over their content
• flexible and extensible: the technical approach should not impose limitations on individual business relationships which might be agreed between content owners and search engine operators; and it should be compatible with different search technologies, so that it does not become rapidly obsolete.
• able to manage permissions associated with arbitrary levels of granularity of content: from a single digital object to a complete website, to many websites managed by the same content owner
• universally applicable: the technical approach should initially be suitable for implementation by all text-based content industries, and so far as possible should be extensible to (or at the very least interoperable with) solutions adopted in other media
• able to manage both generic and specific: able to express default terms which a content owner might choose to apply to any search engine operator and equally able to express the terms of a specific licence between an individual search engine operator and an individual content owners
• as fully automated as possible: requiring human intervention only where this essential to make decisions which cannot be made by machines
• efficient: inexpensive to implement, by enabling seamless integration with electronic production processes and simple maintenance tools
• open standards based: A pro-competitive development open to all, with the lowest possible barriers to entry for both content owners and search engine operators
• based on existing technologies and existing infrastructure: wherever suitable solutions exist, we should adopt and (where necessary) extend them – not reinvent the wheel
The approach taken should also be capable of staged implementation – it should be possible for initial applications to be relatively simple, while providing the basis for seamless extension into more sophisticated permissions management.
Although the scope of the project is initially limited to the relationship between publishers and search engine operators, a framework which meets these requirements should be readily extensible to other business relationships (although details of implementation would not be the same in every case).
6 The Pilot Project
The ACAP pilot project is expected to last for around 12 months. In outline, it anticipated that the project will:
• confirm and prioritise the business and technical requirements with the widest possible constituency: agreement with all stakeholders is essential if the project is to succeed in the long term
• agree which specific Use Cases should be implemented in the pilot phase of the project, starting with a relatively simple approach
• develop the elements of the technical solution: it is anticipated that this will primarily involve the development of standards for policy expression, although it will also be necessary to develop the tools for the implementation of those standards
• identify a suitable group of organisations willing and able to participate in the pilot project; it is currently anticipated that this could involve four or five publishers and one of the major search engines; participants will need to be in a position to dedicate technical and time resources to the project to enable it to succeed
• pilot the standards and the tools, to prove the underlying concepts
In parallel with the development of the technical solution, a significant stream of project work will involve the development of a sustainable governance structure to manage and extend the standards (and any related technical services) which will be needed after the project phase of ACAP is complete.
To avoid duplication of effort, ACAP will also establish liaisons with relevant standards developments elsewhere. In particular, the project is already in contact with EDItEUR with respect to its development of ONIX for Licensing Terms; and, in view of the significance of identification issues, with the International DOI Foundation.
7 Next steps
It is anticipated that the project will be launched publicly in September 2006; there is a great deal to be achieved between now and then, and at launch it will be possible to be much more explicit about plans and expectations. However, it is very important that the publishing community as a whole is ready and willing to respond positively when the project is launched.
The feasibility study commissioned by WAN, EPC and ENPA concluded that this project is technically feasible – and indeed requires little in the way of genuinely new technology. Rather, it requires the integration and implementation of identification and metadata technologies that are already well understood. It is also possible to chart a developmental path which does not demand that every element of the framework must be in place before any of it can be usefully implemented.
However, this is not to suggest that everything will be simple, not that it can be achieved without cost. A significant part of the project cost will have to be borne by those organisations that agree to participate in the pilot, in the development of their own systems; however, there will also be central costs, to which it is hoped that other publishers will be prepared to contribute.
If you have any questions about this project, or would simply like to express your support, please contact: [email protected]
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!