NIST-SRE04-16 Dataset

Dataset Description

This is an aggregation of the NIST-SRE datasets from 2004 to 2016.

Related paper:

@inproceedings{nist16,
    title={The 2016 NIST Speaker Recognition Evaluation},
    author={ Sadjadi, Seyed Omid and Kheyrkhah, Timothee and Tong, Audrey and Greenberg, Craig and Reynolds, Douglas and Singer, Elliot and Mason, Lisa and Hernandez-Cordero, Jaime},
    booktitle={Proc. of Interspeech 2017},
    pages={1353--1357},
    year={2017}
}

The core protocol contains:

Identities

Sample count

train

6213

71728

dev

references

80

120

probes

5

1207

eval

references

802

1202

probes

5

9294

GMM

To run the baseline, use the following commands:

bob bio pipeline train -d nist-sre04to16 -p gmm-default -o results/gmm_nist -l sge-demanding -n 512 --split-training --n-splits 8
bob bio pipeline simple -d nist-sre04to16 -p gmm-default -g dev -g eval -l sge -o results/gmm_nist

Then, to generate the scores, use:

bob bio metrics -e ./results/gmm_nist/scores-{dev,eval}.csv
Table 11 [Min. criterion: EER ] Threshold on Development set: 1.007006e+00

Development

Evaluation

Failure to Acquire

0.0%

0.0%

False Match Rate

22.2% (21395/96342)

27.0% (2013356/7453619)

False Non Match Rate

22.0% (48/218)

7.7% (13/169)

False Accept Rate

22.2%

27.0%

False Reject Rate

22.0%

7.7%

Half Total Error Rate

22.1%

17.4%

ISV

To run the baseline, use the following command:

bob bio pipeline simple -d nist-sre04to16 -p isv-nist -g dev -g eval -l sge -o results/isv_nist

Then, to generate the scores, use:

bob bio metrics -e ./results/isv_nist/scores-{dev,eval}.csv
Table 12 [Min. criterion: EER] Threshold on Development set: TODO

Development

Evaluation

On 128[1] CPU nodes on the SGE Grid: TODO

Speechbrain ECAPA-TDNN

To run the baseline, use the following command:

bob bio pipeline simple -d nist-sre04to16 -p speechbrain-ecapa-voxceleb -g dev -g eval -l sge -o results/speechbrain_nist

Then, to generate the scores, use:

bob bio metrics -e ./results/speechbrain_mobio_male/scores-{dev,eval}.csv
Table 13 [Min. criterion: EER ] Threshold on Development set: -3.860876e-01

Development

Evaluation

Failure to Acquire

0.0%

0.0%

False Match Rate

12.9% (12434/96342)

11.4% (852522/7453619)

False Non Match Rate

12.8% (28/218)

23.7% (40/169)

False Accept Rate

12.9%

11.4%

False Reject Rate

12.8%

23.7%

Half Total Error Rate

12.9%

17.6%

On 70[1] CPU nodes on the SGE Grid: Ran in 55 minutes (no training).

Footnotes