January 6, 2009

Google: statistical machine translation

Google has been investing significant resources in a multi-year effort to develop its statistical machine translation technology. It seems really interesting to me.

The sense of some word (e.g. singular, plural, synonym, polynym, ..., etc) is not clear, and Idioms usually cannot be translated literally in another language. It means it's important to grasp the meaning from the context. So, "statistical machine translation".

Generally, the alignment methodology used for machine translation. Then, We can simply think about statistical alignment and machine translation for a mapreduce model as describe below :

- Compute p(f/e) by summing the probabilities of all alignments.




e: English sentence
l : the length of e in words
f : French sentence
m : the length of f
fj : word j in f
aj : the position in e that fj is aligned with
eaj : the word in e that fj is aligned with
p(wf/we) : translation prob.
Z : normalization constant

You may want to see also : Bayesian spam filtering using Map/Reduce
Anyway, We may see the multi-language/semantic-information search engine soon.

References

* “구글의 장담 ‘언어 장벽 없앤다’”
* 자동번역 검색은 '웹 언어장벽 해체'의 시작
* Wikipedia: Example-based Machine Translation
* Google Translation Center, a New Human Translations Service in the Making
* Google Translation Center: The World’s Largest Translation Memory
* More Languages in Goole Translate
* MS 번역 서비스 '윈도 라이브 번역기'
* 구글-인터넷 바벨탑에 도전한다?
* CAT(Computer Aided Translation) tool 관련
* Statistical Machine Translation Org

No comments:

Post a Comment