New Tagging Suggests Google Sees Translated Content As Duplicates

google-webmaster-tools-iconLast year Google launched meta tags for sites where a multilingual "template (i.e., side navigation, footer) is machine-translated into various languages but the "main content remains unchanged, creating largely duplicate pages." This week they have gone a step further and now include the ability to differentiate between regions that speak basically the same language with slight differences.

Like the canonical tag, the implementation falls on the website owners to do, in order to get "support for multilingual content with improved handling for these two scenarios:

  • Multiregional websites using substantially the same content. Example: English webpages for Australia, Canada and USA, differing only in price.
  • Multiregional websites using fully translated content, or substantially different monolingual content targeting different regions. Example: a product webpage in German, English and French."

This tagging is interesting and suggests Google knows when the content on a site is duplicate despite it being in a different language. Has their data storage the ability to translate, or just recognize words that are used in the same language but are regionally different? If I use "biscuit" on my UK or Australian sites in place of "cookies", does Google know they are the same word?

"If you specify a regional subtag, we'll assume that you want to target that region," Google tells us.

Is duplicate content now being measured for similar terms? Or are the tags a way to have website owners limit the pages Google index for regional areas? We add the tags and Google thins the pages we have showing in the SERPs for different regions?

Google shared some example URLs:

  • http://www.example.com/ - contains the general homepage of a website, in Spanish
  • http://es-es.example.com/ - the version for users in Spain, in Spanish
  • http://es-mx.example.com/ - the version for users in Mexico, in Spanish
  • http://en.example.com/ - the generic English language version

On these pages, you can use this markup to specify language and region (optional):

  • [link rel="alternate" hreflang="es" href="http://www.example.com/" /]
  • [link rel="alternate" hreflang="es-ES" href="http://es-es.example.com/" /]
  • [link rel="alternate" hreflang="es-MX" href="http://es-mx.example.com/" /]
  • [link rel="alternate" hreflang="en" href="http://en.example.com/" /]

Seems like many wouldn't bother installing the tags unless Google was to start dropping pages, or if the implementation helps improve regional rankings for the pages where publishers have gone that extra step and customized their content to specific regions and subtle language differences.

The hreflang tag has been around for quite some time. The W3 organization discusses it in 2006 and has it in its links in HTML documents list. This addition in to the head tag information seems to be a new twist. How Google uses the information for ranking will really determine if people will use it.

About the author

Frank Watson has been involved with the Web since it started. For the past five years, he headed SEM for FXCM -- at one time one of the top 25 spenders with AdWords. He has worked with most of the major analytics companies and pioneered the ability to tie online marketing with offline conversion.

He has now started his own marketing agency, Kangamurra Media. This new venture will keep him busy when he is not editing the Search Engine Watch forums, blogging at a number of authoritative sites, and developing some interesting online community sites.

He was one of the first 100 AdWords Professionals, a Yahoo and Overture Ambassador, and a member or mod of many of the industry forums. He is also on the Click Quality Council and has worked hard to diminish click fraud.