Performance of ISV-based ASV system under spoofing attacks
Published: 8 years, 4 months ago
The goal of a ASV system is to correctly verify the claimed identity of the user. During training, the system builds for each registered user the speech model, and when evaluated on the development set (for this set, the identity of each audio sample is known), the resulted scores are split into two sets (genuine data of correct identity and the users with the wrong identity) in such a way that False Acceptance Rate (FAR) and False Reject Rate (FRR) are equal. This equal rate is usually called Equal Error Rate (err in the table below). The median value of the split scores is the EER threshold (threshold in the table), since this is the specific value of the system that leads to EER.
Applying the EER threshold obtained from development set to the scores of the test set leads to another pair of FAR (far_test in the table) and FRR (frr_test in the table) values, which are the measures of the system's performance in uncontrolled evaluation settings (in our case, the spoofing attack are present). In a perfect ASV system, FAR and FRR values on the test set would be the same as FAR and FRR values obtained for Dev set. Hence, to summarize the performance of the system in one value, a Half Total Error Rate (hter in the table) is computed as the mean of FAR and FRR. The HTER is then used as an overall measure of the ASV system performance.
Please note that FAR for Test set (with spoofing attacks) is larger than 97%. The system was trained and tuned on real data only.
Voice activity detection is based on the modulation of the energy around 4Hz, the features include 19 MFCCs and energy, with their first and second derivatives. 256 Gaussian components were used in modeling.