Back in 2009, Google officially started supporting use of the rel=”canonical” attribute in HTML. The canonical attritbute is part of a link tag, in the section of your HTML page.
This code allows webmasters to help Google identify pages that are similar in content. This could include pages with the same product description with different options for styles or colors. Using the tag helps save you from duplicate content issues on Google.
After a recent warning from Google about the potential to hack canonical tags, discussion forums ran rampant with chatter. Today, Google announced they are now supporting use of canonical at the server level. To understand this, let’s take a look at a typical Web transaction.
Your browser requests a page from the server. The server typically responds back with a numerical status code (200, 404, etc.) and some information. On a status code of 200, which indicates to the browser the request will be satisfied, the information returned includes the MIME type (HTML, PDF, text, etc.) and the size of the file before actually sending the contents of the requested file.
Those returned pieces of information before the file are called the server headers and look similar to this:
It their announcement, Google describes support for the use of a rel=”canonical” header before the server sends the requested file to a browser. The format of the canonical header is similar to that of the HTML code:
Link: ; rel=”canonical”
Google has taken some criticism recently for not adhering to standards. Specifically, the new schema.org project goes against HTML5 microformat standards. However, the World Wide Web Consortium (W3C) has previously published information about an HTTP link header. So purists can rest assured, this is not without precedent.
The issue with HTTP headers is that it isn’t easy to code. This may require assistance from your system administrator. It will certainly require some additional coding or plug-ins on within your content management system.
Because it’s so difficult to implement, it’s reasonable to assume this has the possibility to slow down those sites that scrape content. It also has the possibility to trump any HTML canonical attributes. However, your Web server should still be the first one out of the gate to implement it.
Google states they are only supporting HTTP link headers for their Web search product at this time. If the trend catches on, they may consider adding its functionality to their other properties.