The world is shrinking and with it the need to support content for more users means supporting more languages.
Supporting multiple languages on websites is nothing new, but with browser and search engine technology starting to rely on structured data, it has never been more important to make sure you are using the correct markup.
This is where rel="alternate" hreflang="x" comes in handy.
Simply put, when used correctly, this specification element helps Google index and serve the localized version of your content to users who require an alternate language version.
The best use of this tag is in instances where you have the same or regionally specific content in another language on the same website.
Google recommends use in the following three scenarios:
- You have a completely alternate version of your site translated in a different language.
- You translate only the navigational components of a page, like the navigation, sidebar, or footer, but the main content remains in only one language. This is commonly used on pages that include user-generated content.
- Your pages have very similar content within a single language with regional variations, for example you have Spanish language content targeted at readers in Spain and also South America.
Multilingual vs. Multiregional
- Multilingual: A website offering its content in more than one language. An example would include a Canadian business with separate websites on the same domain for both the English and French versions of its content.
- Multiregional: Google defines multiregional content as a website that “explicitly targets users in different countries.” This gets a bit difficult to wrap your head around because websites can be both multilingual and multi-regional, for example you could have a soccer (futbol) site with different versions for the USA and for South America, and both Spanish and Portuguese versions of the South American content.
Managing Multilingual Versions of Your Site
The first rule of thumb is to make sure you present your pages so that the default language is clear, both to user’s and search engines. This is best achieved by maintaining a consistent base language throughout your website and declaring the default language in the header markup of each page.
Important note: You should never, ever use automated translation – but if you must due to the nature of your industry or the size of your website, then make sure you use your robots.txt file to block search engines from crawling any auto-translated pages, especially since auto-translated pages can be viewed as spam.
Also, consider the general user experience implications of your URLs. This is something you should always look at closely.
For localized URLs it best to utilize subdirectories whenever possible, and only use subdomains when there is no other option, as subdomains require significantly more configuration including DNS, Ip, etc.
Example of a localized subdirectory:
Example of a localized subdomains:
Also, it’s fine to translate words in the URL to the local language, but make sure to use the UTF-8 encoding in the URL.
Escaping a URL
This is a critical component that really isn't talked about much in international SEO.
Escaping a URL means to make sure you encode any illegal characters in your URL to their UTF-8 equivalent. This isn’t illegal in the sense that you’re going to get arrested, but more so in terms of proper accepted syntax.
For example, a non-breaking space (or ) is an illegal character in a URL, so if you tried entering a URL like http://somedomain.com/here-comes-a space, this would be encoded as either http://somedomain.com/Here-Comes-a%20space or http://somedomain.com/Here-Comes-a+space where both %20 and ‘+’ are the UTF-8 supported encodings for the illegal syntax.
Targeting Specific Countries
This is perhaps one of the most overlooked areas of managing multilingual websites.
Google generally uses the following elements to help determine a website’s targeted country:
- Server location (through IP address of server). Servers are most likely becoming a decreasingly important signal to search engines as more websites are beginning to move to distributed hosting solutions like virtual servers and cloud-based networks.
- Top-Level Domains. Also referred to as ccTLDs or country-code specific top-level domains. These are representative of the specific countries they are targeted for, so .ca for Canada or .jp for Japan. This makes them a strong signaling property to both users and search engines that the content on your site is intended for users within that country.
- Geotargeting. This is a setting in your Google webmaster tools that can be used to indicate that your site is targeted for a specific country.
- Additional signals. Localization signals have been expanded to now include elements like addresses and phone numbers on pages, local language and currency, and even links from other websites.
Structuring Your URLs
It’s extremely important to carefully plan your site’s information architecture to maximize its benefits for SEO, and structuring your URLs accordingly.
So when it comes to structuring your URLs for localized content, you have a few options ranging from most ideal to least ideal:
Image Source: Google
Duplicate Content and Its Implications
Duplicate content is bad. This is simple to understand and relatively simple to avoid.
One specific scenario to watch out for is placing the same content on two different URLs, even if these URLs are targeted for the same local audience.
Having the same content on example.jp and example.com/jp/ will be treated as duplicate content and carry with it a negative SEO effect. If it is necessary that these pages both exist, make sure you take steps to not compete against yourself for organic rankings.
You have two options:
Using rel="alternate" hreflang="x"
Simply put, the best use of this specification element is in instances where you have the same content in another language on the same website.
So let’s say you have an English website at http://www.example.com
You also have a Japanese version of that webpage at http://www.example.com/ja/
Google has created three ways for you to indicate that the Japanese URL is the Japanese-language equivalent of the English page:
HTML Link Element
<link rel="alternate" hreflang="es" href="http://es.example.com/" />
If you have non-HTML content on your web pages, a good example is a PDF file, you can use rel=”canonical” HTTP headers to indicate the canonical URL for HTML documents.
Link: <http://es.example.com/>; rel="alternate"; hreflang="es"
Instead of using markup, you can submit a language specific version via a sitemap.
<?xml version="1.0" encoding="UTF-8"?>
Each supported language must be set using rel="alternate" hreflang="x" to identify all the language versions including itself.
Coming back to regional specific content, this gets a bit tricky (but not too tricky) when specifying content for a specific language and a specific region.
For example: a website that serves the U.S., Germany, and Japan could very well have the following regional variations:
- http://www.example.com/en/page (Generic English version of content - language specific; English)
- http://www.example.com/en-gb/page (English language, displays prices in pounds, example of regional specific content)
- http://www.example.com/en-us/page (English language displays prices in U.S. dollars, example of regional specific content)
- http://www.example.com/ja/page (Japanese version of content)
This is what the actual tag markup should look like:
<link rel="alternate" hreflang="en"
<link rel="alternate" hreflang="en-us"
<link rel="alternate" hreflang="ja"
You need to update the HTML for each URL in the set of alternates. The markup above tells Google to consider all of these pages as alternate versions of one another.
Make sure you use the proper syntax for both countries and regions to ensure proper functionality of your rel=“alternate” tags.
There are three scenarios where you should be using rel=“alternate” hreflang=“x”. You need to structure your data and specify tags for both the language and the region, when applicable.
Pay close attention to how you build your URL’s and use the recommended signaling factors to properly target the countries your content is meant to serve.
For More Information
Check out the international SEO tab in Annie Cushing’s audit checklist, where Aleyda Solis and Gianluca Fiorelli offer more indispensable information, or Fernando Macia’s deck on International Search Engine Optimization (hat tip to Aleyda for these great resources!).
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!