The New Meta Tags Are Coming -- Or Are They?

The New Meta Tags Are Coming
- Or Are They?

From The Search Engine Report
Dec. 4, 1997

Some of you may have heard glimmerings of new way of meta tagging web sites that's under development by the W3 Consortium, the organization that sets standards for the web. A new standard is currently expected to emerge after the New Year.

The development is accompanied by a variety of acronyms: RDF, XML, and way back, there was some hype about MCF. Mixed in with all this is talk about Netscape and Microsoft either fighting with each other or shaking hands in agreement.

I wanted to provide some guidance as to what's happening, though this is an issue that will be revisited as standards develop.

Before stepping forward, it's important to understand where we are now.

The "meta" in "meta tags" arises from the term "metadata," which means data about other data. For example, a meta tag may describe the content of web page. That description is data about other data, the web page.

Any web page can have a variety of metadata associated with it. We currently use tags to define page descriptions, keywords, page authors, the date a page was created, a "child-safe" rating and more.

What most people don't realize is that there are no officially designated meta tags in HTML. There is a framework for meta tags, but the HTML specs do not definite exactly what meta tags exist and how they should be used.

In the simplest terms, the HTML specs say that meta tags have a name and value area, such as this:

That framework has been used by various groups to create meta tags that are used in different ways. For example, browser manufacturers have provided support for the meta refresh tag, which causes a browser to reload a new page or the same page after a few seconds.

Readers of this newsletter are most familiar with the two meta tags that enjoy widespread support by the search engines, the meta keywords and meta descriptions tag.

One survey found that 12% and 11% of web pages used those two tags, respectively. That sounds low, but those were actually the highest counts of any tags. Only usage of the generator tag approached those numbers (7.5%), and this tag is usually added automatically by authoring tools. Beyond these, other well-known tags include PICS (2.5%), a child-safe rating, and the content tag (2.2%), used to set the language character set of a web page.

Now I'm about to bring up all those new acronyms, hopefully in a way that helps them relate back to the meta tags and meta tagging concepts everyone is familiar with.

The new W3C format sets up a new way to describe metadata. It says that we have RESOURCES (web pages or entire web sites) that we wish to DESCRIPTION (or describe) using a more efficient FRAMEWORK that currently exists.

Hence, the new name: Resource Description Framework, or RDF. It sounds different, but it's about the same thing everyone is used to doing, which is defining a web site in useful ways.

Now I'm going to blow right through some of those other acronyms, discussing them only to let you know why they come up. In the grand scheme of things, they don't really matter.

We write existing meta tags in HTML. RDF tags (schemes is the proper term) are to be written using XML. XML is the successor language to HTML, meant to enhance how we currently author web pages.

As you might expect, Netscape and Microsoft have their own opinions on how a new language should be created. Likewise, they've had their own opinions on how metadata should be expressed. Earlier this year, Netscape was backing a plan called MCF, or Meta Content Framework. Microsoft put forth its own "Web Collections" proposal.

Now forget all that, because the big two browser makers are close to being friends, at least on the metadata front. They're behind RDF, so we should be all set for a new era in defining web pages, right?

Maybe. It's important to note that the search engines have not participated in the development of the new framework. The reaction I've gotten from various representatives has ranged from "What's RDF?" to "I suppose we might support it, if it makes sense."

Search engine support is crucial for success, as demonstrated by the lack of support for the existing Dublin Core meta tags.

These are a set of 15 tags allow a page to be labeled with a description, keywords, author notation, copyright statements and other information. They emerged out of a meeting of researchers, librarians, computer professionals and others in early 1995 (they met in Dublin, Ohio -- hence the name). They wanted to improve the ability to search for web documents.

Practically no one uses these tags, and the reason why is because none of the major search engines does anything with them. They don't index them, nor do they provide a way to search within the Dublin Core meta tag fields. (The more familiar meta description and keywords tags supported are NOT Dublin Core tags).

The WC3 hopes that perhaps with a new framework, the search engines (among others) will be more receptive to metadata. Perhaps it will give them incentive to reevaluate indexing more metadata.

"The whole thing has a chicken and egg process," said Ora Lassila, one of the editors of the RDF spec.

In other words, Lassila explained, you have to start somewhere, either with a framework or with tags. The W3C hopes that by providing the framework, support and adoption of new, useful tags will follow.

Realistically, a meta tag framework has already existed for several years, complete with the comprehensive Dublin Core set, and yet nothing has happened. Therefore, it's hard to see search engines suddenly deciding that metadata is a priority.

Add to this the fact that many at search engines do not trust metadata. It's fine to talk about how nice it would be if all web pages were categorized, but the search engines know from experience that people will lie, mislead or do whatever they can to get on top.

A further handicap is that the new RDF standard is not simple. The specs are still being formatted, but it is far more complicated to define metadata for the average user than under the existing system -- and that's a system that people already have trouble with.

Those are the negatives. How about some positives? There are a number of reasons the search engines might very well take notice and make use of the new framework.

First, the new system makes it possible to define information once for an entire site. There would be no need to label each page with an author tag, for example. A common file could contain information all crucial information, making it easier to maintain by authors and easier to gather by search engines.

Secondly, the system makes allowances for tags to be digitally certified. Plans are already in the works to make this possible for a revised PICS tag, so that there's some assurance a porn site couldn't pretend to be kiddie-safe.

Tim Bray, who's part of the RDF working group and formerly the founder of the Open Text search engine, envisions a system where there might be digitally certified descriptions. Imagine that third parties might assign web sites keywords, categories and classifications that could be trusted. Search engines might then be more likely to embrace metadata.

As for the complexity of writing RDF tags, they're not really meant to be the sort of thing you would copy and paste manually. XML is much more complicated than HTML, so authoring tools are expected to have all this stuff built in.

As the spec continues to be developed, you'll continue to hear lots of great things that will come of RDF from various people. If it's browser related and a browser company rep is talking about it, expect that it may come true. However, don't rely on the search-related aspects to happen until you hear search engine reps talking about them.

Don't forget, even if search engines don't take advantage of RDF, they may move ahead with other ways to communicate metadata. Inktomi, which powers HotBot and which will power the new Microsoft service, is considering enhancing its existing meta tags support after the New Year.

W3C Metadata Area

Keep up with what's happening with the RDF development. Lots of technical documents.

Dublin Core Metadata

Learn more about the Dublin Core tags, which emerged from a meeting 52 researchers, library professionals, computer scientists and others in Dublin, Ohio, back in March 1995.

Meta Attributes By Count

Statistics from a special robot run done in March 1997 to count the type of meta tags used. Produced by the author of "A Dictionary of HTML META Tags,"

DSig 1.0 Signature Labels

How the W3C proposes making PICS labels digitally certified. Could digitally certified descriptions and keywords for documents be next?