Google Explains the Nuances of Language Translation

Since search involves people from all of the world speaking a variety of languages, Google takes language translation very seriously. Shankar Kumar and Wolfgang Macherey recently took to the Official Google Research blog to explain more about Google's translation methods.

Specifically, Kumar and Macherey talked about the Minimum Bayes Risk (MBR) criterion in how to determine which translation to return to a user. It's best explained in their own words:

Essentially, we look at a sample of the best candidate translations (the so called n-best list) and choose the safest one, the one most likely to do the least amount of damage (where 'damage' is defined by our measurement of translation quality). You might want to view this as choosing a translation that is a lot like the other good translations instead of choosing that strange one that had the good model score.

Kumar and Macherey went on to say that they improve the diversification of MBR by adding candidate translations. They build lattices (a mathematical set, not a fence, though the fence is a decent visual) of translations which the MBR uses to search for the n-best approach. The more languages added to the lattice, the more diversified the search is.

Related Reading:
Google Enables Cross-Language Search for Enterprise Search Appliance
Google Translate Adds Widget, Notranslate Code Snippets
Google Translate Goes Live with Human Translators