Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers

02/16/2023
by   Melissa Hall, et al.
8

Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a lack of discussion on the vulnerabilities of these measurements for more complex computer vision tasks. In this paper, we consider multi-label image classification and, specifically, object categorization tasks. First, we highlight design choices and trade-offs for measurement that involve more nuance than discussed in prior computer vision literature. These challenges are related to the necessary scale of data, definition of groups for images, choice of metric, and dataset imbalances. Next, through two case studies using modern vision models, we demonstrate that naive implementations of these assessments are brittle. We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments, both in terms of magnitude and direction (on which group the classifiers work best) of disparities. Based on ablation studies, we propose some recommendations to increase the reliability of these assessments. Finally, through a qualitative analysis we find that concepts with large disparities tend to have varying definitions and representations between groups, with inconsistencies across datasets and annotators. While this result suggests avenues for mitigation through more consistent data collection, it also highlights that ambiguous label definitions remain a challenge when performing model assessments. Vision models are expanding and becoming more ubiquitous; it is even more important that our disparity assessments accurately reflect the true performance of models.

READ FULL TEXT

page 13

page 22

research
03/09/2022

Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers

Algorithmic fairness is frequently motivated in terms of a trade-off in ...
research
08/31/2023

FACET: Fairness in Computer Vision Evaluation Benchmark

Computer vision models have known performance disparities across attribu...
research
05/21/2020

Gender Slopes: Counterfactual Fairness for Computer Vision Models by Attribute Manipulation

Automated computer vision systems have been applied in many domains incl...
research
02/15/2022

Fairness Indicators for Systematic Assessments of Visual Feature Extractors

Does everyone equally benefit from computer vision systems? Answers to t...
research
06/08/2023

Does Image Anonymization Impact Computer Vision Training?

Image anonymization is widely adapted in practice to comply with privacy...
research
04/13/2022

Estimating Structural Disparities for Face Models

In machine learning, disparity metrics are often defined by measuring th...
research
07/12/2023

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

The ubiquitous and demonstrably suboptimal choice of resizing images to ...

Please sign up or login with your details

Forgot password? Click here to reset