Tense-Annotation

This dataset provides parallel texts in English/French from Europarl, along with an alignment of the verbs in the sentences with information on their position, tense and voice.

Get Data


Description

This dataset contains parallel English and French texts from the Europarl corpus (Koehn, 2005).

The files provide alignments of EN and FR verbs along with information on their position, tense and voice and can therefore be used in translational studies for these languages and/or the training of translation systems that can make use of the labels in this resource.

Content

Although the resource was created semi-automatically, the verb alignment and inferred tenses are of high precision, especially in the second file contained in the package:

Tense-Annotation-full.txt : complete alignment.

Tense-Annotation-gold.txt : alignments only for cases where there is an EN /and/ an FR tense that was inferred from the verbs.

A description of the method that was used to create the alignment will soon be published.

The format in the two files is the following:

EN sentence

FR sentence

Position_in_EN    \tab    EN_verb1    \tab    EN_tense    \tab    EN_voice    \tab    FR_verb    \tab    FR_tense    \tab    FR_voice

Position_in_EN    \tab    EN_verb2    \tab    EN_tense    \tab    EN_voice    \tab    FR_verb    \tab    FR_tense    \tab    FR_voice

 

EN sentence

FR sentence

Position_in_EN    \tab    EN_verb1    \tab    EN_tense    \tab    EN_voice    \tab    FR_verb    \tab    FR_tense    \tab    FR_voice

...

The following is an explanation on the labels used:

EN_tense:

past_perf_cont = Past Perfect Continuous

past_perf = Past Perfect

past_cont = Simple Past Continuous

sim_past = Simple Past

pres_perf = Present Perfect

pres_perf_cont = Present Perfect Continuous

pres_perf = Present Perfect

pres_cont = Present Continuous

pres = Present

fut_perf_cont = Future Perfect Continuous

fut_perf = Future Perfect

fut_cont = Future Continuous

fut = Future

cond_perf_cont = conditional verb group with in continuous past tense

cond_perf = conditional verb group in past tense

cond_cont = conditional verb group in continuous present tense

cond = conditional verb group in present tense

infinitif = base verb form

no_tag = tense not found

 

EN_voice:

active, passive, unknown

 

FR_tense:

pres = présent

passe_comp = passé composé

imparfait = imparfait

plus_que_parf = plus-que-parfait

passe_sim = passé simple

passe_rec = passé récent

passe_ant = passé antérieur

imperatif = impératif

subjonctif = subjonctif

conditionnel = conditionnel

futur_proche = futur proche

futur = futur

futur_ant = futur antérieur

no_tag = tense not found

 

FR_voice

active, passive, unknown

@ = unaligned words

Acknowledgements

Work regarding this resource was partially funded by the SNF Sinergia projects COMTIS and MODERN. We would also like to thank Andrei Popescu-Belis, Bastien Crettol, Yann Rodriguez and Vincent Spano of Idiap for their help in making the data available.