Several mostly print publisher groups say they are to test a new "Automated
Content Access Protocol" that they feel will head off conflicts with search
engines. A release with more information is below.
Exactly how the system will work, why it is different or better than existing
systems like robots.txt or meta robots tags, isn’t explained. More details are
promised to be unveiled at the Frankfurt Book Fair on October 6.
I’m planning to talk with the World Association Of Newspapers to learn more
about their plans next week, so I may have more before the formal unveiling.
I’ve had a very informal talk already, and the view seems to be to find a way to
make the existing systems work better. That’s appreciated, and it’s something
the search marketing community has long wanted. But it’s something I hope will
involve more than just a group of publishers with mostly print interests.
Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers article
from earlier this week explains how in my view, the entire issue that has
erupted in Belgium is less about keeping content out of search engines and more
about trying to force them to pay publishers for inclusion. Right now, any
publisher that feels copyright is somehow infringed by being in a search engine
has a very easy, very selectable way to keep whatever they want out:
or meta robots tags.
These work on a web-wide basis, have support of all the major search engines,
plus have been used by users from publishers of all types. They could definitely
be improved — but in the Belgium case in particular, using them would have
solved the exact problem that was raised.
Here’s the release:
GLOBAL PUBLISHERS HEAD OFF LEGAL CLASH WITH SEARCH ENGINES: NEW RIGHTS
MANAGEMENT PILOT IMMINENT
In the week that the publishers of Le Soir and La Libre Belgique won their
case in the Belgian Courts against Google for illegally publishing content on
its news service without prior consent, the
World Association of Newspapers (W.A.N.),
the European Publishers Council (E.P.C.)
the International Publishers Association
(I.P.A.) and the European Newspapers Association
(E.N.P.A), are preparing to launch a global industry pilot project that aims
to avoid any future clash between search engines and newspaper, periodical,
magazine and book publishers.
The new project, ACAP (Automated Content Access Protocol), is an automated
enabling system by which the providers of content published on the World Wide
Web can systematically grant permissions information (relating to access and
use of their content) in a form that can be readily recognised and interpreted
by a search engine “crawler”, so that the search engine operator (and
ultimately, any other user) is enabled systematically to comply with such a
policy or licence. Effectively, ACAP will be a technical solutions framework
that will allow publishers worldwide to express use policies in a language
that the search engine’s robot “spiders” can be taught to understand.
Gavin O’Reilly, Chairman of the W.A.N., said: “This system is intended to
remove completely any rights conflicts between publishers and search engines.
Via ACAP, we look forward to fostering mutually beneficial relationships
between publishers of original content and the search engine operators, in
which the interests of both parties can be properly balanced. Importantly,
ACAP is an enabling solution that will ensure that published content will be
accessible to all and will encourage publication of increasing amounts of
high-value content online. This industry-wide initiative positively answers
the growing frustration of publishers, who continue to invest heavily in
generating content for online dissemination and use.”
Francisco Pinto BalsemÃ£o, Chairman of the E.P.C., said: “ACAP will
unambiguously express our preferred rights and terms and conditions. In doing
so, it will facilitate greater access to our published content, making it
more, not less available, to anyone wishing to use it, whilst avoiding
copyright infringement and protecting search engines from future litigation.”
ACAP will be presented in more detail at the forthcoming Frankfurt Book
Fair on 6th October and will be launched officially by the end of the year.
W.A.N., the E.P.C. and I.P.A. will run the pilot for a period of up to 12
months and it will be managed by Rightscom
The European Publishers Council is a high level group of Chairmen and CEOs
of European media corporations actively involved in multimedia markets
spanning newspaper, magazine and online database publishers. Many EPC members
also have significant interests in commercial television and radio.
The World Association of Newspapers groups 72 national newspaper
associations, individual newspaper executives in 100 nations, 13 news
agencies, and nine regional press organizations, representing .more than
18,000 publications in all international discussions on media issues, to
defend both press freedom and the professional and business interests of the
press. The International Publishers Association is a Non Governmental
Organisation with consultative relations with the United Nations. Its
constituency is of book and journal publishers world-wide, assembled into 78
publishers associations at national, regional and specialised level. The
European Newspaper Publishers’ Association – is a non-profit association
currently representing 5 100 national, regional and local newspapers. These
daily, weekly and Sunday titles are published in 24 European countries where
ENPA’s members are operating in their national markets.
Postscript: I’ve just received this briefing paper that explains more.
I’ve skimmed it and attached one note marked in bold. Basically, the existing
robots.txt or meta robots systems can do a lot of what’s already described here.
What they cannot do is help search engines access content because the publisher
allows this only through a licensing agreement, something the Belgian publishers
seem to want. In addition, the pilot can do all it wants. Unless some major
search engines agree to cooperate, the pilot will go nowhere. Again, I’ll follow
up more on this next week after talking with the groups involved.
Automated Content Access Protocol
paper for publishers on a project in
1 Executive summary
All sectors of publishing face a “search engine dilemma”. The value of search
engines to users – and to those who publish on the network – is
incontrovertible. However, search engine activities can be very damaging to
specific online publishing models. The undifferentiated model of permissions
management (essentially either allowing or forbidding search of content) is
inadequate to support the diverse present and future internet strategies and
business models of online publishers.
At the beginning of 2006, the major publishing trade associations established
a Working Party, chaired by Gavin O’Reilly, Chairman of the World Association of
Newspapers, to consider the issues that this has raised. As a result, the World
Association of Newspapers and the European Publishers Council are planning a
project which will develop and pilot a technical framework which will allow
publishers to express access and use policies in a language which the search
engine’s robot “spiders” can be taught to understand. This will make it possible
to establish mutually beneficial business relationships between publishers and
search engine operators, in which the interests of both parties can be properly
The project is provisionally called ACAP (for Automated Content Access
Protocol). ACAP will develop and pilot a system by which the owners of content
published on the World Wide Web can provide permissions information (relating to
access and use of their content) in a form in which it can be recognised and
where necessary interpreted by a search engine “crawler”, so that the search
engine operator (and perhaps, ultimately, any other user) is enabled
systematically to comply with such a policy or licence.
This paper is intended to brief
publishers on the outline of this project and to encourage their active support
and participation when the project is launched in September 2006.
2 Background – the “search engine” problem
At the beginning of 2006, the major Europe-based publishing trade
associations – including the World Association of Newspapers (WAN); the European
Publishers Council (EPC); the European Newspaper Publishers Association (ENPA);
the International Publishers Association (IPA); the European Federation of
Magazine Publishers FAEP); the Federation of European Publishers (FEP); the
World Editors Forum (WEF); the International Federation of the Periodical Press
(FIPP) and Agence France Presse – established a Working Party to consider the
issues that are posed by search engines for publishers, and to look at ways in
which mutually beneficial relationships can be established between publishers
and search engine operators, in which the interests of both parties can be
All sectors of publishing have a “search engine dilemma” (even if we
disregard the particular problems that book publishers have with mass
digitisation programmes). Search engines are an unavoidable and valued port of
call for anyone seeking an audience on the internet. Search engines sit between
internet users and the content they are seeking out and have found brilliantly
simple and effective ways to make money from that audience. They have become so
dominant that no individual website owner is large enough to have any serious
impact on their commercial fortunes.
The benefits of powerful search technology to both users and providers of
content are well recognised by publishers – although even “mere” search
functionality can have a negative impact on some publishing business models. At
the same time, publishers are aware that search engines are, in following their
business logic, inevitably and gradually moving into a publisher-like role,
initially merely pointing, then caching and, finally, aggregating and
“publishing” and perhaps even creating content themselves, while using
publishers’ content at will.
In the current state of technology, there can be none of the differentiation
of terms of access and use which characterises copyright-based relationships in
publishing environments, whether electronic or physical. The search engines can
and do reasonably argue that, since their systems are completely automated, and
they cannot possibly enter into and manage individual and different agreements
with every website they encounter, there is no practical alternative to their
current modus operandi.
Whether this (technological and political) gap is there by design or by
accident, the search engines are able to make their own rules and decide for
themselves whose interests are worth considering.
If publishers are to take the initiative in establishing orderly business
relationships with the search engine operators, the response must be to help
them to address the problem, both to fill the technical gap and ensure its
political implementation. To paraphrase the former copyright adviser to the UK
Publishers Association Charles Clark’s famous claim that “the answer to the
machine is in the machine”, the challenges that are created by technology are
best resolved by technology. Since search engine operators rely on robotic
“spiders” to manage their automated processes, publishers’ web sites need to
start speaking a language which the operators can teach their robots to
understand. What is required is a standardised way of describing the permissions
which apply to a website or webpage so that it can be decoded by a dumb machine
without the help of an expensive lawyer.
In this way, one of the search engines’ most reliable rationalisations of
their “our way or no way” approach will have been removed, and a structure which
embraces and supports the diverse present and future internet strategies and
business models of online publishers will have been created.
As a result of the work of the Working Party, a proposal was made to develop
a permissions based framework for online content. This would be a technical
specification which would allow the publisher of a website or any piece of
content to attach extra data which would specify what use by search engines was
allowable for that piece of content or website. The aim will be for this to
become a widely implemented standard, ultimately embedded into website and
content creation software.
Following the commissioning of a brief feasibility study, WAN and EPC have
taken the initiative to establish a project to develop and pilot this framework
to express publishers’ access and use policies. A detailed plan for this project
– provisionally called ACAP (for Automated Content Access Protocol) – is
currently in development.
This paper is intended to brief
publishers on the outline of this project and to encourage their active support
and participation when the project is launched in September 2006.
3 ACAP – the vision
ACAP will develop and pilot a system by which the owners of content published
on the World Wide Web can provide permissions information (relating to access
and use of their content) in a form in which it can be recognised and where
necessary interpreted by a search engine “crawler”, so that the search engine
operator (and perhaps, ultimately, any other user) is enabled systematically to
comply with such a policy or licence. Permissions may be in the form of
• policy statements which require no formal agreement on the part of a user
• formal licences agreed between the content owner and the search engine
There are two distinct levels of permissions which need to be managed within
• The permission given to the search engine operators for their own operations
(access, copy and download, cache, index, make available for display)
• The delegation of rights given to the search engine operators to grant
permissions of access and use to search engine users (search, access, view,
copy, download, etc)
Although these can be managed within the same framework, it is important that
the differences between them are recognised.
4 Use Cases
We include two informal Use Cases which are illustrative of the type of
challenge that we seek to solve through ACAP.
4.1 USE CASE A: NEWSPAPERS
Newspaper publisher A would like all search engines to index his site, but
only search engines X, Y and Z may display articles (because they have paid a
royalty) on their news pages, and then only for 30 days. All images must be
fully attributed as they are in the newspaper. The newspaper publisher uses
articles syndicated by other newspapers and news agencies and cannot grant
permission for those items, to the extent of the third party rights. Articles
should not be permanently cached.
NOTE FROM DANNY: Using existing systems, publishers privileged enough to
be included in news search engines don’t have their articles displayed. They
have links to those articles displayed, along with a description, something that
people do all over the web and is generally accepted as fair use. Specific
search engines can be blocked, if that’s the desire. Specific images can also be
blocked. Publishers can require those reprinting their content to install blocks
4.2 USE CASE B: BOOKS
Book Publisher B invites search engine operators X, Y and Z to index the full
text of his latest college text books. The web site where the full text is
stored should not be made visible to search engine clients. He wishes that
search engine users can browse only 2 pages of a maths book, but 20 pages of a
philosophy text book. Search engine users should be able to buy individual
chapters for private use, at $5 and $3 per chapter respectively.
5 Business requirements
Although it will be an integral part of the ACAP project to further develop
and confirm the business requirements of publishers for the operation of the
framework, significant progress has already been made in identifying the high
level business requirements against which any technical solution must be
measured. In summary, the solution must be:
• enabling not obstructive: facilitating normal business relationships, not
interfering with them, while providing content owners with proper control over
• flexible and extensible: the technical approach should not impose limitations
on individual business relationships which might be agreed between content
owners and search engine operators; and it should be compatible with different
search technologies, so that it does not become rapidly obsolete.
• able to manage permissions associated with arbitrary levels of granularity of
content: from a single digital object to a complete website, to many websites
managed by the same content owner
• universally applicable: the technical approach should initially be suitable
for implementation by all text-based content industries, and so far as possible
should be extensible to (or at the very least interoperable with) solutions
adopted in other media
• able to manage both generic and specific: able to express default terms which
a content owner might choose to apply to any search engine operator and equally
able to express the terms of a specific licence between an individual search
engine operator and an individual content owners
• as fully automated as possible: requiring human intervention only where this
essential to make decisions which cannot be made by machines
• efficient: inexpensive to implement, by enabling seamless integration with
electronic production processes and simple maintenance tools
• open standards based: A pro-competitive development open to all, with the
lowest possible barriers to entry for both content owners and search engine
• based on existing technologies and existing infrastructure: wherever suitable
solutions exist, we should adopt and (where necessary) extend them – not
reinvent the wheel
The approach taken should also be capable of staged implementation – it
should be possible for initial applications to be relatively simple, while
providing the basis for seamless extension into more sophisticated permissions
Although the scope of the project is initially limited to the relationship
between publishers and search engine operators, a framework which meets these
requirements should be readily extensible to other business relationships
(although details of implementation would not be the same in every case).
6 The Pilot Project
The ACAP pilot project is expected to last for around 12 months. In outline,
it anticipated that the project will:
• confirm and prioritise the business and technical requirements with the widest
possible constituency: agreement with all stakeholders is essential if the
project is to succeed in the long term
• agree which specific Use Cases should be implemented in the pilot phase of the
project, starting with a relatively simple approach
• develop the elements of the technical solution: it is anticipated that this
will primarily involve the development of standards for policy expression,
although it will also be necessary to develop the tools for the implementation
of those standards
• identify a suitable group of organisations willing and able to participate in
the pilot project; it is currently anticipated that this could involve four or
five publishers and one of the major search engines; participants will need to
be in a position to dedicate technical and time resources to the project to
enable it to succeed
• pilot the standards and the tools, to prove the underlying concepts
In parallel with the development of the technical solution, a significant stream
of project work will involve the development of a sustainable governance
structure to manage and extend the standards (and any related technical
services) which will be needed after the project phase of ACAP is complete.
To avoid duplication of effort, ACAP will also establish liaisons with relevant
standards developments elsewhere. In particular, the project is already in
contact with EDItEUR with respect to its development of ONIX for Licensing
Terms; and, in view of the significance of identification issues, with the
International DOI Foundation.
7 Next steps
It is anticipated that the project will be launched publicly in September
2006; there is a great deal to be achieved between now and then, and at launch
it will be possible to be much more explicit about plans and expectations.
However, it is very important that the publishing community as a whole is ready
and willing to respond positively when the project is launched.
The feasibility study commissioned by WAN, EPC and ENPA concluded that this
project is technically feasible – and indeed requires little in the way of
genuinely new technology. Rather, it requires the integration and implementation
of identification and metadata technologies that are already well understood. It
is also possible to chart a developmental path which does not demand that every
element of the framework must be in place before any of it can be usefully
However, this is not to suggest that everything will be simple, not that it
can be achieved without cost. A significant part of the project cost will have
to be borne by those organisations that agree to participate in the pilot, in
the development of their own systems; however, there will also be central costs,
to which it is hoped that other publishers will be prepared to contribute.
If you have any questions about this project, or would simply like to express
your support, please contact: