Data AUDIT: Identifying Attribute Utility- and Detectability-Induced Bias in Task Models

04/06/2023
by   Mitchell Pavlak, et al.
11

To safely deploy deep learning-based computer vision models for computer-aided detection and diagnosis, we must ensure that they are robust and reliable. Towards that goal, algorithmic auditing has received substantial attention. To guide their audit procedures, existing methods rely on heuristic approaches or high-level objectives (e.g., non-discrimination in regards to protected attributes, such as sex, gender, or race). However, algorithms may show bias with respect to various attributes beyond the more obvious ones, and integrity issues related to these more subtle attributes can have serious consequences. To enable the generation of actionable, data-driven hypotheses which identify specific dataset attributes likely to induce model bias, we contribute a first technique for the rigorous, quantitative screening of medical image datasets. Drawing from literature in the causal inference and information theory domains, our procedure decomposes the risks associated with dataset attributes in terms of their detectability and utility (defined as the amount of information knowing the attribute gives about a task label). To demonstrate the effectiveness and sensitivity of our method, we develop a variety of datasets with synthetically inserted artifacts with different degrees of association to the target label that allow evaluation of inherited model biases via comparison of performance against true counterfactual examples. Using these datasets and results from hundreds of trained models, we show our screening method reliably identifies nearly imperceptible bias-inducing artifacts. Lastly, we apply our method to the natural attributes of a popular skin-lesion dataset and demonstrate its success. Our approach provides a means to perform more systematic algorithmic audits and guide future data collection efforts in pursuit of safer and more reliable models.

READ FULL TEXT
research
07/13/2020

Towards causal benchmarking of bias in face analysis algorithms

Measuring algorithmic bias is crucial both to assess algorithmic fairnes...
research
03/24/2022

Intrinsic Bias Identification on Medical Image Datasets

Machine learning based medical image analysis highly depends on datasets...
research
05/10/2023

Analyzing Bias in Diffusion-based Face Generation Models

Diffusion models are becoming increasingly popular in synthetic data gen...
research
08/10/2023

Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

We propose an experimental method for measuring bias in face recognition...
research
11/03/2022

Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity

Existing regulations prohibit model developers from accessing protected ...
research
06/02/2023

Affinity Clustering Framework for Data Debiasing Using Pairwise Distribution Discrepancy

Group imbalance, resulting from inadequate or unrepresentative data coll...
research
05/06/2015

A Deeper Look at Dataset Bias

The presence of a bias in each image data collection has recently attrac...

Please sign up or login with your details

Forgot password? Click here to reset