Outline of research in COMTIS
Objective
The objective of the COMTIS SNSF Sinergia project was to use insights from linguistic modeling and corpus linguistics in order to build computational models of discourse-level phenomena and to combine them with statistical machine translation systems, thus improving the quality of translated texts.
Achievements
The COMTIS researchers have advanced the state of the art in their fields and with respect to the overall objective, thanks to the close collaboration of all partners, also reflected in the quality and number of joint publications. More specifically, we have proposed multilingual models of discourse connectives and verb tenses, strongly grounded in empirical evidence from parallel corpora, mainly in English and French, but also German, Italian, and Arabic. These models have generated features which served to implement automatic labeling systems for discourse connectives, verbs, and pronouns, which were further combined, in several ways, with state-of-the-art statistical MT systems and with innovative tree-based decoding algorithms. This has led to demonstrable improvements of the MT output, as assessed by humans but also by an automatic reference-based metric which COMTIS proposed and validated.
Overall, twelve people have contributed to COMTIS (five directly funded by the project, grant n. CRSI22_127510), resulting in 33 peer-reviewed publications in journals and conference proceedings, accompanied by several public datasets.
Two workshops have been organized in direct relation to COMTIS (in particular DiscoMT at ACL 2013), and the methods put forward will be applied to a different range of phenomena and with an extended consortium in the MODERN Sinergia project (2013-2016).