DeepAI
Log In Sign Up

Measuring Model Biases in the Absence of Ground Truth

03/05/2021
by   Osman Aka, et al.
0

Recent advances in computer vision have led to the development of image classification models that can predict tens of thousands of object classes. Training these models can require millions of examples, leading to a demand of potentially billions of annotations. In practice, however, images are typically sparsely annotated, which can lead to problematic biases in the distribution of ground truth labels that are collected. This potential for annotation bias may then limit the utility of ground truth-dependent fairness metrics (e.g., Equalized Odds). To address this problem, in this work we introduce a new framing to the measurement of fairness and bias that does not rely on ground truth labels. Instead, we treat the model predictions for a given image as a set of labels, analogous to a 'bag of words' approach used in Natural Language Processing (NLP). This allows us to explore different association metrics between prediction sets in order to detect patterns of bias. We apply this approach to examine the relationship between identity labels, and all other labels in the dataset, using labels associated with 'male' and 'female') as a concrete example. We demonstrate how the statistical properties (especially normalization) of the different association metrics can lead to different sets of labels detected as having "gender bias". We conclude by demonstrating that pointwise mutual information normalized by joint probability (nPMI) is able to detect many labels with significant gender bias despite differences in the labels' marginal frequencies. Finally, we announce an open-sourced nPMI visualization tool using TensorBoard.

READ FULL TEXT
05/19/2022

Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation

Women are often perceived as junior to their male counterparts, even wit...
08/07/2021

What do Bias Measures Measure?

Natural Language Processing (NLP) models propagate social biases about p...
08/24/2019

Unsupervised Recalibration

Unsupervised recalibration (URC) is a general way to improve the accurac...
01/23/2022

The risk of bias in denoising methods

Experimental datasets are growing rapidly in size, scope, and detail, bu...
07/02/2018

Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract)

Most machine learning methods are known to capture and exploit biases of...
09/24/2018

Statistical Estimation of Malware Detection Metrics in the Absence of Ground Truth

The accurate measurement of security metrics is a critical research prob...
05/31/2022

Hollywood Identity Bias Dataset: A Context Oriented Bias Analysis of Movie Dialogues

Movies reflect society and also hold power to transform opinions. Social...