Performance of MFCC-GMM-based and LBP-LR-based PAD systems
Published: 8 years, 4 months ago
In MFCC-GMM-based PAD system, 19 MFCCs with deltas and delta-deltas were used as features. Two GMM models are trained: one for real data and one for spoof attacks. The score is computed as a log-likelihood between two GMMs. In LBP-LR-based PAD system, spectrogram is split in two lower and higher halves. For each half, a histogram of LBP (circular, 8 neighbors with radius=1) values is computed and used as the feature. Logsitic regression classifier is used to train and evaluate LBP-based features.
The goal of a PAD system is to distinguish between real data and presentation attacks. During training, PAD system models each of these classes, and when evaluated on the development set (for this set, the class of each audio sample is known), the resulted scores are split into two sets in such a way that False Acceptance Rate (FAR) and False Reject Rate (FRR) are equal. This equal rate is usually called Equal Error Rate (dev_err in the table below). The median value of the split scores is the EER threshold (dev_eer_threshold in the table), since this is the specific value of the system that leads to EER.
Applying the EER threshold obtained from development set to the scores of the test set leads to another pair of FAR (test_far in the table) and FRR (test_frr in the table) values, which are the measures of the system's performance in uncontrolled evaluation settings. In a perfectly consistent PAD system, FAR and FRR values on the test set would be the same as FAR and FRR values obtained for Dev set. Hence, to summarize the performance of the system in one value, a Half Total Error Rate (test_hter in the table) is computed as the mean of FAR and FRR. The HTER is then used as an overall measure of the PAD system performance.