Greetings from Santa Kurara, Kariforunia

Tuesday, October 19, 2010

Hello from the Unicode Conference in Santa Clara, California, where the Maps Transliteration team is giving a talk about ICU-based transliteration. Transliterating this originally Spanish city name to Japanese, we get サンタ・クララ, which (when morphed back to the Latin writing system) becomes “Santa Kurara.”

Machine Transliteration is an active area of research (slides), which means it can be rather challenging in general. Typically, transliteration emulates the pronunciation, but sometimes it also preserves some aspects of the original written spelling. We created transliteration modules with the open-source ICU library for languages that have highly regular spelling; if you’re using Google Maps in Japanese, Russian or Chinese, you can see how we use it to display labels in both the local language and your own:

Today, we’re announcing the contribution of our ICU transliteration rules for Czech, Italian, Japanese, Korean, Mandarin, Polish, Romanian, Russian, Slovak and Spanish to the Unicode Common Locale Data Repository. (For languages with very irregular spelling, like English, we supplement ICU with some more advanced techniques.) If you would like to try writing rules for your own language, have a look at the instructions in the ICU user guide.

アスタ・ラ・ビスタ — “Asuta ra bisuta,” from sunny “Kariforunia!”

By Sascha Brawer, Martin Jansche, Hiroshi Takenaka, and Yui Terashima, Maps Transliteration Team