Improving the coherence of machine translation output by modeling intersentential relations

Machine translation (MT) has made significant progress in the past decade, but its focus has remained on the translation of sentences considered individually. However, in order to ensure overall coherence throughout a translated text, an MT system must also consider and render correctly the items that depend on intersentential relations. The perceived coherence of a translated text, and therefore its overall quality, are mainly influenced by the following markers: pronouns, verb tense/mode/aspect, discourse connectives, and politeness/style/register. None of these markers can be reliably translated on a pure sentence-by-sentence basis. This project aims at extending the current statistical MT (SMT) approach by modeling these intersentential dependencies (ISDs), along the following five themes. Linguistic analysis, corpus data, annotation and test suites, automatic identification of intersentential dependencies, statistical machine translation for ISD-labeled texts, evaluation methods for MT coherence and their application
Application Area - Exploitation of rich multimedia archives, Application Area - Human Machine Interaction, Perceptive and Cognitive Systems
Idiap Research Institute
University of Geneva
SNSF
Mar 01, 2010
Jul 31, 2013