VoxCeleb Dataset¶
Dataset Description¶
VoxCeleb is a collection of voice recording of celebrities extracted from various Youtube videos. It contains:
Identities |
Sample count |
||
train |
1211 |
148642 |
|
dev / eval |
references |
40 |
4874 |
probes |
37720 |
The dev and eval sets are a copy of each other for this protocol. The following results will then only show the development set.
GMM¶
To run the baseline, use the following command:
bob bio pipeline simple -d voxceleb gmm-mobio -l sge-demanding -o results/gmm_voxceleb -n 512
Then, to generate the scores, use:
bob bio metrics -e ./results/gmm_voxceleb/scores-dev.csv
Development |
|
---|---|
Failure to Acquire |
0.0% |
False Match Rate |
18.8% (3538/18860) |
False Non Match Rate |
18.8% (3538/18860) |
False Accept Rate |
18.8% |
False Reject Rate |
18.8% |
Half Total Error Rate |
18.8% |
On 128[1] CPU nodes on the SGE Grid: Ran in 10 hours.
ISV¶
TODO
Speechbrain ECAPA-TDNN¶
This baseline reproduces the speaker verification experiment with a pretrained ECAPA-TDNN model using the SpeechBrain library. The original paper’s reference is the following:
@inproceedings{spear,
author = {Brecht Desplanques, Jenthe Thienpondt and Kris Demuynck},
title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation in {TDNN} Based Speaker Verification},
booktitle = {Interspeech 2020},
year = {2020},
url = {https://www.isca-speech.org/archive_v0/Interspeech_2020/pdfs/2650.pdf},
}
To run the baseline, use the following command:
bob bio pipeline simple -vvv -d voxceleb -p speechbrain-ecapa-voxceleb -g dev -o ./results/speechbrain_voxceleb
Then, to generate the scores, use:
bob bio metrics -e ./results/speechbrain_voxceleb/scores-dev.csv
Development |
|
---|---|
Failure to Acquire |
0.0% |
False Match Rate |
1.0% (189/18860) |
False Non Match Rate |
1.0% (189/18860) |
False Accept Rate |
1.0% |
False Reject Rate |
1.0% |
Half Total Error Rate |
1.0% |
On 128[1] CPU nodes on the SGE Grid: Ran in 9 minutes (no training).
Note
ECAPA-TDNN gives a reference result of 0.8% EER on VoxCeleb. However, they were
using a customized version of the dataset (VoxCeleb (cleaned)
) which ignores
109 probe files (presumably containing wrong data) from our own dataset.
Footnotes