Decision-making tools based on Artificial Intelligence (AI) have been largely deployed in the last years in a myriad of scenarios, with the vast majority relying on Machine Learning (ML) and, more precisely, Deep Learning (DL) algorithms. These algorithms have an extraordinary capacity for enumerating and uncovering hidden factors from large amounts of data. Human specialists mostly overlook such factors due to their complexity to unfold them. Rather than hypothesizing and testing the relationship between an infinite amount of factors manually, ML algorithms can find those by systematically “looking” to high-level data correlations. Thanks to this, the last decades were full of breakthroughs in many fields, from face
recognition over speech recognition to self-driving cars. However, the fact that ML is essentially data-driven by no means ensures that it will lead to fair decisions. More specifically, with this vast range of deployed applications and its influence on decision-making, aspects of fairness start to rise into the spotlight.
Decision-making tools based on biometrics have been largely deployed in the last few years as part of the current DL wave. We use it daily for data protection (e.g. to unlock mobile phones or computers), law enforcement, e-gates on airports, and so on. Face Recognition (FR) is a biometric trait vastly used in practical applications, primarily because of its good compromise between usability and accuracy. Fairness aspects in FR arise when decisions favor one demographic group over others regarding the difference of false matches or false non-matches. Issues with unfair FR models have been constantly reported in the media. As mentioned above, ML-based models are essentially data-driven. Hence, data collection is a vital step in ML research. Data collection is a social phenomenon, and the face datasets available for research are a clear representation of this. If a decade ago face datasets were a representation of research institute demographics, today they are a representation of “public figure” demographics, neither of which reflects operational conditions of FR systems. Moreover, due to legal and ethical issues (e.g. GDPR), large-scale face datasets for research purposes will become scarcer. For instance, in response to an investigation carried out by the Financial Times, Microsoft terminated the MS-Celeb project, which included one of the most popular datasets for FR research; however, research continues on this dataset. This event raised a “red flag” for the research community. FR research with large-scale datasets that respect people’s privacy is now a real concern that the research community must respond to. Legal and diverse data concerning demographics is already scarce today. For instance, equally distributed large scale face datasets that cover aspects of gender, race, and age are nonexistent.
In this project (SAFER) we will address these two major issues (fairness and ethics with respect to data) in FR research in two Research Objectives (RO-1 and RO-2). In RO-1, we will investigate strategies to assess and close the fairness gap. We argue that such a gap can be closed at training and scoring time. In RO-2 we will investigate strategies to close the ethics gap. To do so, we will research mechanisms to generate synthetic datasets that are diverse and large-scale.
The project will cover two Ph.D. students (one research objective per Ph.D.) over the project’s duration and one Post-Doctoral researcher for two years. The Post-Doctoral researcher will bring experience and ensure knowledge transfer to guarantee reproducible research for legacy students, academia and our industrial partner (SICPA). Indeed, in addition to the open science guidelines that will be followed, we will also adhere to reproducible research principles to share our findings. We expect that SAFER will contribute significantly to the increase of fairness in face recognition, to other biometric modalities (e.g., speaker, iris, fingerprint), as well as to the machine learning field as a whole.