Monday, August 20, 2012

Week 5 - Danny's Article Review


Virtual Babel: Towards Context-Aware Machine Translation in Virtual Worlds


This paper describes Virtual Babel, a context-aware machine translation platform for the users in Second Life. Interesting phenomena not seen in document translations are found by Virtual Babel. Considering the non-verbal contexts, models of language as well as translation are constructed in order to translate.

It is noted that in virtual environment an increasing number of people around the world can speak several languages, including English the most popular and predominant language in various domains. However, such is human nature that a great many of people are often willing to use their mother tongues to communicate. Thus, the language barriers still exist in virtual worlds like SL as it does in the out-world.

In the past years, machine translation (MT) has improved drastically to acceptable level for users, especially in phrase-based translation for certain domains, such as broadcasting news and certain language pairs (Och et al., 1999; Koehn et al., 2003). Translation services, including Google Translation API, are applied by users as plug-ins to Skype, MSN and Google Talk.

Such translation services help bridging the communication gap between different users in-world. But two drawbacks are obvious. First, the general translation machine services like broadcasting news text translation are in different genre comparing with online chat. Second, these services are usually context independent in that ambiguities do not matter as they do in casual talks.

The application of MT in virtual worlds can be beneficial for not only the development of context-awareness in MT systems but also the exploration of non-verbal context of communication than in real world. This facilitates us to understand the impacts of contexts on language and the way MT improves translation quality with the help of context information.

From my perspective, the most useful point this article discusses is the topic identification within the context of communication. High frequency words in a conversation can be labeled as key words, which is thereby able to predict the incoming words and to generate precious translations. But the crux to the problem of MT this article tries to resolve yet not succeeded is the great disparities between languages, like English and Chinese. In terms of translation of natural conversations, MT seems even more unqualified that users can barely understand. Data collection and context-aware translation tends to be a mission impossible given the numerous contexts, information and correlations between languages.

References
Och, F.J., et al. (1999). Improved alignment models for statistical machine translation. In Proc. of the Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, University of Maryland, College Park, America.

Koehn. P., et al. (2003). Statistical phrase-based translation: Proceedings of the Human Language Technology and North: American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Canada.

1 comment:

  1. It is interesting to see the potential of SL in traslating area.

    ReplyDelete