Improving the coherence of machine translation output by modeling intersentential relations

Machine translation (MT) has made significant progress in the past decade, but its focus has remained on the translation of sentences considered individually. However, in order to ensure overall coherence throughout a translated text, an MT system must also consider and render correctly the items that depend on intersentential relations. The perceived coherence of a translated text, and therefore its overall quality, are mainly influenced by the following markers: pronouns, verb tense/mode/aspect, discourse connectives, and politeness/style/register. None of these markers can be reliably translated on a pure sentence-by-sentence basis. This project aims at extending the current statistical MT (SMT) approach by modeling these intersentential dependencies (ISDs), along the following five themes. Linguistic analysis, corpus data, annotation and test suites, automatic identification of intersentential dependencies, statistical machine translation for ISD-labeled texts, evaluation methods for MT coherence and their application

Themes

Application Area - Exploitation of rich multimedia archives, Application Area - Human Machine Interaction, Perceptive and Cognitive Systems

Leader Name

Partners

University of Geneva

Funding

Website URL

Start

Mar 01, 2010

Stop

Jul 31, 2013

Groups

Natural Language Understanding

Contacts