BLM-OdI (Blackbird Language Matrices Object Drop verb alternations in Italian)

An Object-Drop (OD) alternation dataset for testing lexical semantic properties of verbs, their ability to enter or not a causative alternation

Get Data


Description

BLM-OdI is an Object-Drop (OD) alternation dataset for testing lexical semantic properties of verbs, their ability to enter or not a causative alternation. The subject in OD bears the same semantic role (Agent) in both the transitive and intransitive forms (L’artista dipingeva la finestra/L’artista dipingeva the artist painted the window’/‘the artist painted’) and the verb does not have a causative meaning.

Blackbird Language Matrices (BLMs) are multiple-choice problems, where the input is a sequence of sentences built using specific generative rules, and the answer set consists of a correct answer that continues the input sequence, and several incorrect contrastive options. The contrastive options are built by violating the underlying generating rules of the sentences. In a BLM matrix, all sentences share the targeted linguistic phenomenon (in this case verb alternations), but differ in other aspects relevant for the phenomenon in question.

BLM datasets also have a lexical variation dimension, to explore the impact of lexical variation on detecting relevant structures: type I – minimal lexical variation for sentences within an instance, type II – one word difference across the sentences within an instance, type III – maximal lexical variation within an instance.

The data comes grouped by lexical variation (i.e. type I/II/III) and each subset is split into train/test. Each split contains 2140 training and 240 testing instances.

 

Reference

If you use this dataset,please cite the following publication:

Nastase, Vivi& Samo, Giuseppe & Jiang, Chunyang & Merlo, Paola. (2024). Exploring Italian sentence embeddings properties through multi-tasking. DOI: 10.48550/arXiv.2409.06622.