Virtual Babel: Towards Context-Aware
Machine Translation in Virtual Worlds
This paper describes Virtual Babel, a
context-aware machine translation platform for the users in Second Life.
Interesting phenomena not seen in document translations are found by Virtual
Babel. Considering the non-verbal contexts, models of language as well as
translation are constructed in order to translate.
It is noted that in virtual environment an
increasing number of people around the world can speak several languages,
including English the most popular and predominant language in various domains.
However, such is human nature that a great many of people are often willing to
use their mother tongues to communicate. Thus, the language barriers still
exist in virtual worlds like SL as it does in the out-world.
In the past years, machine translation
(MT) has improved drastically to acceptable level for users, especially in
phrase-based translation for certain domains, such as broadcasting news and
certain language pairs (Och et al.,
1999; Koehn et al., 2003). Translation services, including Google Translation
API, are applied by users as plug-ins to Skype, MSN and Google Talk.
Such translation services help bridging
the communication gap between different users in-world. But two drawbacks are
obvious. First, the general translation machine services like broadcasting news
text translation are in different genre comparing with online chat. Second,
these services are usually context independent in that ambiguities do not matter
as they do in casual talks.
The application of MT in virtual worlds
can be beneficial for not only the development of context-awareness in MT
systems but also the exploration of non-verbal context of communication than in
real world. This facilitates us to understand the impacts of contexts on
language and the way MT improves translation quality with the help of context
information.
From my perspective, the most useful point
this article discusses is the topic identification within the context of communication.
High frequency words in a conversation can be labeled as key words, which is
thereby able to predict the incoming words and to generate precious
translations. But the crux to the problem of MT this article tries to resolve
yet not succeeded is the great disparities between languages, like English and
Chinese. In terms of translation of natural conversations, MT seems even more
unqualified that users can barely understand. Data collection and context-aware
translation tends to be a mission impossible given the numerous contexts,
information and correlations between languages.
References
Och, F.J., et al. (1999). Improved alignment models
for statistical machine translation. In Proc. of the Conference on Empirical
Methods in Natural Language Processing and Very Large Corpora,
University of Maryland, College Park, America.
Koehn. P., et al. (2003). Statistical phrase-based
translation: Proceedings of the Human Language Technology and North: American
Association for Computational Linguistics Conference (HLT/NAACL), Edmonton,
Canada.
It is interesting to see the potential of SL in traslating area.
ReplyDelete