YouTube is introducing machine-generated automatic captioning to YouTube. The captions can also be translated. This obviously has incredible implications for the hearing-impaired and language translation. But it also has great implications for search.
Automatic captions will be generated using Google's automated speech recognition (ASR) technology and the same voice recognition algorithms used in Google Voice.
Additionally, auto-timing is being introduced. If you provide all the words in the video, Google will automatically time the captioning for you.
Of course, having what essentially amounts to transcripts for online video means that the text can be crawled and indexed and then yes - SEARCHED. Bring on the keyword research and seo scriptwriting for online videos!
Google put together a video on how to access the automatic captioning and auto-timing features:
Posted by Nathania Johnson at 11:47 AM | Permalink | Comments (6)
Google has cleaned up Translate with a new look, and threw in a few new features for good measure:
Translate as you type - The only problem with this is that any translator will tell you that langauges don't translate word for word. But this has always been a problem with computed translation, so it's not like the problem is worse with this or anything. Taking away the need to click a button feels like part of a greater strategy Google is testing, such as the recent homepage tests where the buttons under the text box were removed.
Romanization of character-based languages - If you're translating to a character-based language such as Chinese, you can select "Show romanization" to get the words displayed phonetically in English. This is not yet available for Hebrew, Arabic, or Persian.
Text to speech - If you'd like to hear how your translation sounds, just click the speaker icon.
For more information on these new features, check out the video Google put together:
Posted by Nathania Johnson at 10:32 AM | Permalink | Comments (1)
Google AdWords has added support for Google Translator Toolkit. This means you can now translate your keywords and have ads appear to global audiences in their language.
However, Search Engine Watch columnist Andy Atkins-Krüger warned about the pitfalls of keyword translation in an August post entitled Translating Keywords Should Never EVER Happen. Atkins-Kruger made the good point that while serving up ads to audiences in their language is important, simple translation is risky.
The reason is that this planet is so culturally diverse. Even countries that share the same language often have their regional nuances. Language in and of itself does not contain a culture.
The best way to serve up foreign language ads is through human translation by someone who understands the culture of the country that's being targeted. While Google's intentions appear to be good in offering this new feature, it should be used with great caution.
Posted by Nathania Johnson at 11:55 AM | Permalink | Comments (8)
Google has added a whopping 285 languages to its Translator Toolkit. It brings the total number of languages to 345 and the number of possible language pairs to 10,664. The interface for Translator Toolkit is now available in 35 languages.
Google says its focusing on minority languages. This includes regional, heritage, indigenous, and threatened languages. Google wants to help preserve these lesser known languages so that these smaller cultures won't be forgotten as history constantly unfolds.
One of the minority languages is Māori, an Eastern Polynesian language spoken in New Zealand. According to 2006 data, only 24% of Māori can speak their own language. Google has been working with Dr. Te Taka Keegan at the University of Waikato to preserve the Māori language. Keegan is a senior lecturer in computer science and an expert in how computer-assisted translation tools can assist in the preservation of minority languages.
Keegan has found that computer-assisted translation aids in faster translations and language unification. Tools like Google Toolkit can help in breaking down language barriers and enhancing the understanding of cultures from around the world.
Posted by Nathania Johnson at 2:47 PM | Permalink | Comments (1)
Google is embracing the global web by offering two new Translation features. The first is a widget that can be placed on websites to assist with translation. When a website has the code snippet for the widget, the language settings in a visitor's browser will be detected. If the language is other than the website's language, the visitor will be prompted to have the page translated by Google.
The other new feature was released with the Google Toolbar update last week. For the Firefox version, new advanced in-page translation is available.
Posted by Nathania Johnson at 5:58 PM | Permalink | Comments (3)
If you've been following the events surrounding the election in Iran, you know that sites such as Twitter and Facebook have been crucial in demonstrating what is going on in the country. The government has been trying to stifle protests, but the citizens - and media - are determined to tell the story.
But, if you're not fluent in Persian, also known as Farsi, you might have problems reading those Tweets or Facebook updates. Now, you'll have a little help.
In reponse to the current election crisis in Iran, Google and Facebook have (separately) released Persian (also known as Farsi) translation features.
Facebook has launched a version of their social network in Persian. If you're using a Persian browser (you're probably not reading this post), it should launch automatically. Otherwise, go to your settings and select Persian from the Language tab.
Google has added Persian to Google Translate. They're pushing it out early due to the events in Iran. It's optimized for translation with English, and even then it may have a few glitches.
Posted by Nathania Johnson at 10:24 AM | Permalink | Comments (0)
Google Translate has added 7 more languages. They are:
This brings the total number of languages on Google Translate to 41. The available languages reach the ones spoken by 98% of internet users.
Last September, Google added 11 languages and last summer, Google "went live" with human translation as a service.
Related Reading: Google Explains the Nuances of Language Translation Google Translate Adds Widget, Notranslate Code Snippets
Posted by Nathania Johnson at 8:05 AM | Permalink | Comments (0)
Since search involves people from all of the world speaking a variety of languages, Google takes language translation very seriously. Shankar Kumar and Wolfgang Macherey recently took to the Official Google Research blog to explain more about Google's translation methods.
Specifically, Kumar and Macherey talked about the Minimum Bayes Risk (MBR) criterion in how to determine which translation to return to a user. It's best explained in their own words:
Essentially, we look at a sample of the best candidate translations (the so called n-best list) and choose the safest one, the one most likely to do the least amount of damage (where 'damage' is defined by our measurement of translation quality). You might want to view this as choosing a translation that is a lot like the other good translations instead of choosing that strange one that had the good model score.Kumar and Macherey went on to say that they improve the diversification of MBR by adding candidate translations. They build lattices (a mathematical set, not a fence, though the fence is a decent visual) of translations which the MBR uses to search for the n-best approach. The more languages added to the lattice, the more diversified the search is.
Related Reading: Google Enables Cross-Language Search for Enterprise Search Appliance Google Translate Adds Widget, Notranslate Code Snippets Google Translate Goes Live with Human Translators
Posted by Nathania Johnson at 10:15 AM | Permalink | Comments (1)
Google Translate has released a few updates to help you translate, or not, your pages for your site visitors.
First up is a widget that you can place on your site to offer visitors translation via Google Translate. It's very Google branded, so that may deter some, but here's what it looks like:
Secondly, there are code snippets available if you do NOT want Google to be able to translate your page or certain parts of a page.
class=notranslate is available for any html element. Here's an example:
For an entire page, use meta tags like this:
Available languages include:
Related Reading: Google Translate Adds 11 Languages Google Translate Goes Live with Human Translators Google Webmaster Central Updates Include API Settings and Crawl Error Sources
Posted by Nathania Johnson at 9:06 AM | Permalink | Comments (1)
Google has added 11 languages to its Translate product. They are:
The total number of languages is now 34 and the total number of language pair combinations nearly doubled from 506 to 1122.
Recently, Google added human translators as well.
Posted by Nathania Johnson at 12:08 PM | Permalink | Comments (0)
Google Translate is going live. The world's most comprehensive set of translation technologies will now be aided by human beings translating documents upon request.
Google employees won't be in the business of translating documents. Rather, Google will offer volunteer and professional translators the opportunity to use Google tools and technologies to translate. In previous columns, we've discussed the need for localization in translation. It looks as if Google will take the lead on using local translators to aid machine translation.
Google Translation Center will enable users to upload a document, choose a translation language, and select from Google's registry of professional and volunteer translators. If a translator accepts, users will receive the translated content back as soon as it's ready.
The potential for use by SEOs and Internet marketing managers is huge. The service may offer affordable ways to translate not only URLs but entire Web sites.
Google's translation search feature matches a current translation with previous translations, so it won't be necessary to translate a document more than once. In short, Google could create the world's largest repository of completed homework assignments for students taking a foreign language.
We'll keep you posted as Google's human-assisted translation service officially launches.
Posted by Kevin Heisler at 1:56 PM | Permalink | Comments (1)
My colleague, ClickZ Executive Editor, AnnaMaria Virzi, tipped me off to a Gawker.com post online that's only received a little more than a thousand views and very few comments this morning. Gawker.com is the Nick Denton Web site where bloggers are paid "per post" to blog about pop culture and media gossip.
The ghoulish post slams MT (machine translation) of foreign languages. Yes, somehow Google made an error translating the phrase "Heath Ledger is Dead" into Spanish. Everyone in the search community knows machine translation isn't perfect and global Web sites require localization. For an intelligent discussion, click here.
The Gawker post by Joshua Stein appears to be a blatant attempt to pump up traffic to the Gawker-hosted Tom Cruise Scientology video.
Yes, the link above takes you to Hitwise analyst Heather Dougherty who best explains the Gawker traffic-building strategy and its results.
Posted by Kevin Heisler at 10:25 AM | Permalink
Google has replaced the Systran software it had been using on its Google Translate service with its own translation software, according to Ionut Alex Chitu at Google Operating System.
Google had been using its own translation system for Arabic, Chinese, and Russian translations, but now uses it for all 25 languages it translates.
The difference between Google's system and other systems is the use of statistical learning techniques to massive amounts of text, rather than building a complex rules-based approach, according to the Google Translate FAQ.
"Google's approach works better for some languages and worse for others, but at least Google can expand to other languages without having to know them and manually create models for each one," Chitu writes.
At Google Blogoscoped, Philipp Lenssen compares Google Translation to Systran and a human translation of a German paragraph into English, and vice-versa. "I couldn't see a clear winner yet (though I get the feeling Google's results are slightly superior), but a lot of garbage results on both ends," he writes.
Posted by Kevin Newcomb at 11:53 AM | Permalink
The cross-language information retrieval (CLIR) technology which Google VP Udi Manber previewed at Searchology last week has launched, according to the Google Blog. The technology has been built into Google Translate. It allows users to type in a query in one language, and instruct Google to find results from another language.
Google will translate the query into that other language, find results, and then translate those results into the original query language to present to the user. In effect, this allows users to seamlessly search documents in foreign languages as easily as they search in their own language.
An example is [wine tasting events in Bordeaux].
Google admits the translations are not always perfect, as illustrated by a search of Japanese Web sites for Boston Red Sox pitcher Daisuke Matsuzaka, or a search of Spanish sites for soccer team Real Madrid. But of course, the product is still in beta.
Posted by Kevin Newcomb at 10:40 AM | Permalink
PinkNews reports that Google has agreed to change the Arabic translation of the word "gay" within Google's translation tool. Now, I don't know Arabic, but reportedly, the translation Google provides is equivalent to the word sodomite, which is derogatory. PinkNews says Google has "vowed to ameliorate the issue shortly."
Posted by Barry Schwartz at 9:54 AM | Permalink
Philipp Lenssen reports that Google one-upped Yahoo's Babel Fish by adding English to Arabic & Arabic to English to the Google Translate service. Babel Fish, I believe, does not have Arabic translation yet.
Posted by Barry Schwartz at 9:45 AM | Permalink
Google Blogoscoped points to a Christian Science Monitor article that takes a look at some of the machine translation work that Google and others are working on.
Systran, the company whose technology is used Yahoo's Babelfish and other sites (including Google's) are mentioned.
Another company doing work in an area related to mechanical translation is Basis Technology. According to this article, Google also uses Basis technology.
Finally, if you're interested in quickly comparing a bunch of online translation programs, Michael Fagan's Translation Wizard, is very useful.
Posted by Gary Price at 1:29 PM | Permalink