FairMI - Machine Learning Fairness with Application to Medical Images

The algorithmic bias remains one of the key challenges for the wider applicability of Machine Learning (ML) in healthcare. Statistical modeling of natural phenomena has gained traction due to increased representation capacity and data availability. In medicine, particularly, the use of ML models has increased significantly in recent years, especially to support large scale screening, and diagnosis. However impactful, the study of demographic bias of newly developed or already deployed ML solutions in this domain remains largely unaddressed. This is particularly true in the medical imaging domain, where it remains challenging to associate demographic attributes with features. Out of the most recent results, the "impossibility of fairness" establishes some criteria for demographic impartiality cannot be reached simultaneously. Among other factors, the lack of raw data, in particular for intersections of minorities, is one of the greatest issues that remain unaddressed due to their challenging nature. This proposal addresses three important challenges in the domain of ML fairness for medical imaging: (i) Create novel ways to train ML models for medical imaging tasks, that can be automatically adjusted to become more useful (maximize performance), group or individually fair, (ii) Quantify fairness boundaries of ML models and associated development data, and finally, (iii) Build systems whose joint performance with humans in the decision loop is fair towards various individuals and demographic groups. To achieve these goals, we will develop a novel evaluation framework and loss functions that take into account model utility together with all aspects of demographic fairness one may wish to address. A generative framework, trained to isolate tunable demographic features, will provide large-scale data simulation covering minorities and intersections. We will then study fairness (safety) boundaries through a modified learning curve setup, analyzing and quantifying limits in both ML models and training data. Finally, we will study how humans-in-the-decision-loop affect the fairness of hybrid human-AI systems, and address post-deployment utility/fairness tuning by embedding weight coefficients directly into the trained model. The development of methods and tools to detect, mitigate, or remove bias will improve the safety of ML models deployed in healthcare. We expect our work will help define new operational boundaries for the responsible deployment of artificial intelligence tools.
Federal University of Sao Paulo
Idiap Research Institute
SNSF
Mar 01, 2024
Feb 29, 2028