New funding (SNSF/Sinergia) for Idiap Research Institute
COMTIS: Improving the Coherence of Machine Translation Output by Modeling Intersentential Relations
Partners:
Idiap (Dr. Andrei Popescu-Belis), UniGe/Department of Linguistics (Prof. Jacques Moeschler, Dr. Sandrine Zufferey), UniGe/Centre Universitaire d'Informatique (Dr. James Henderson, Dr. Paola Merlo).
Abstract:
Machine translation (MT) has made significant progress in the past decade, but its focus has remained on the translation of sentences considered individually. However, in order to ensure overall coherence throughout a translated text, an MT system must also consider and render correctly the items that depend on intersentential relations. The perceived coherence of a translated text, and therefore its overall quality, are mainly influenced by the following markers: pronouns, verb tense/mode/aspect, discourse connectives, and politeness/style/register. None of these markers can be reliably translated on a pure sentence-by-sentence basis.
This project aims at extending the current statistical MT approach by modeling these intersentential dependencies (ISDs), along the following five themes: linguistic analysis; corpus data, annotation and test suites; automatic identification of intersentential dependencies; statistical machine translation for ISD-labeled texts; and evaluation methods for MT coherence and their application. The project involves researchers in human language technology, machine learning, linguistics, and system evaluation, coming from three different groups with extensive contributions to the relevant fields. Their collaboration is grounded in several previous joint achievements, and will lead to the design of a robust, operational system. The project will significantly boost the dynamics of Swiss research in MT and will contribute to position it more firmly within the European and international community.
More information here