This proposal addresses the related issues of fairness and robustness to distribution shifts using a common theoretical framework. We posit that a common cause lies in the ubiquity of underspecification i.e. the existence of multiple predictive models compatible with the data. These models can be indistinguishable from their in-domain performance, even though they differ qualitatively in properties such as fairness and out-of-distribution (OOD) generalization.
This project will develop methods to diagnose and address underspecification in vision-and-language
models. First, we will develop a metric to quantify the severity of underspecification for a given
dataset/architecture pair. The metric aims to indicate when a lack of constraints from the architecture or data increases the risk of unexpected behaviour during OOD deployment.
Second, we propose a method to align a learned model with human specifications of fairness and robustness. After discovering a variety of predictive features in the data, we synthesize ambiguous
instances, guided by conflicting predictions from these features. We then seek human feedback on these instances to resolve the ambiguities of underspecification and obtain human-aligned, intrinsically explainable, fair and robust models.
As a step toward multimodal conversational AI, this project will apply to vision-and-language models and visual question answering, leveraging the PI’s extensive experience in this area.