SLURP-Fr Real

real test portion of the SLURP-Fr dataset

Get Data


Description

This is the real test portion of the SLURP-Fr dataset, which is a part of the dataset created for the studies on interpreter-aided spoken language understanding (SLU) in the paper below, with three different parts:

  1. SLURP-Fr, an end-to-end SLU dataset based on the French portion of MASSIVE, containing 16,521 synthetic audio samples created using Google TTS, accompanied with 477 real test samples collected from two French speakers at Idiap.
  2. SLURP -Es, a similar dataset based on the parallel Spanish portion of MASSIVE, containing only synthetic samples.
  3. Spoken Gigaword, a speech summarization dataset generated from Gigaword, containing 51,385 synthetic audio samples created using Google TTS.

 

Reference

 If you use this dataset, please cite the following publication :

He, Mutian, and Philip N. Garner. "The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation." Findings of EMNLP 2023.
https://doi.org/10.48550/arXiv.2305.09652