Tense-Annotation
Description
This dataset contains parallel English and French texts from the Europarl corpus (Koehn, 2005).
The files provide alignments of EN and FR verbs along with information on their position, tense and voice and can therefore be used in translational studies for these languages and/or the training of translation systems that can make use of the labels in this resource.
Content
Although the resource was created semi-automatically, the verb alignment and inferred tenses are of high precision, especially in the second file contained in the package:
Tense-Annotation-full.txt : complete alignment.
Tense-Annotation-gold.txt : alignments only for cases where there is an EN /and/ an FR tense that was inferred from the verbs.
A description of the method that was used to create the alignment will soon be published.
The format in the two files is the following:
EN sentence
FR sentence
Position_in_EN \tab EN_verb1 \tab EN_tense \tab EN_voice \tab FR_verb \tab FR_tense \tab FR_voice
Position_in_EN \tab EN_verb2 \tab EN_tense \tab EN_voice \tab FR_verb \tab FR_tense \tab FR_voice
EN sentence
FR sentence
Position_in_EN \tab EN_verb1 \tab EN_tense \tab EN_voice \tab FR_verb \tab FR_tense \tab FR_voice
...
The following is an explanation on the labels used:
EN_tense:
past_perf_cont = Past Perfect Continuous
past_perf = Past Perfect
past_cont = Simple Past Continuous
sim_past = Simple Past
pres_perf = Present Perfect
pres_perf_cont = Present Perfect Continuous
pres_perf = Present Perfect
pres_cont = Present Continuous
pres = Present
fut_perf_cont = Future Perfect Continuous
fut_perf = Future Perfect
fut_cont = Future Continuous
fut = Future
cond_perf_cont = conditional verb group with in continuous past tense
cond_perf = conditional verb group in past tense
cond_cont = conditional verb group in continuous present tense
cond = conditional verb group in present tense
infinitif = base verb form
no_tag = tense not found
EN_voice:
active, passive, unknown
FR_tense:
pres = présent
passe_comp = passé composé
imparfait = imparfait
plus_que_parf = plus-que-parfait
passe_sim = passé simple
passe_rec = passé récent
passe_ant = passé antérieur
imperatif = impératif
subjonctif = subjonctif
conditionnel = conditionnel
futur_proche = futur proche
futur = futur
futur_ant = futur antérieur
no_tag = tense not found
FR_voice
active, passive, unknown
@ = unaligned words
Acknowledgements
Work regarding this resource was partially funded by the SNF Sinergia projects COMTIS and MODERN. We would also like to thank Andrei Popescu-Belis, Bastien Crettol, Yann Rodriguez and Vincent Spano of Idiap for their help in making the data available.