Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

08/10/2023
by   Hao Liang, et al.
0

We propose an experimental method for measuring bias in face recognition systems. Existing methods to measure bias depend on benchmark datasets that are collected in the wild and annotated for protected (e.g., race, gender) and non-protected (e.g., pose, lighting) attributes. Such observational datasets only permit correlational conclusions, e.g., "Algorithm A's accuracy is different on female and male faces in dataset X.". By contrast, experimental methods manipulate attributes individually and thus permit causal conclusions, e.g., "Algorithm A's accuracy is affected by gender and skin color." Our method is based on generating synthetic faces using a neural face generator, where each attribute of interest is modified independently while leaving all other attributes constant. Human observers crucially provide the ground truth on perceptual identity similarity between synthetic image pairs. We validate our method quantitatively by evaluating race and gender biases of three research-grade face recognition models. Our synthetic pipeline reveals that for these algorithms, accuracy is lower for Black and East Asian population subgroups. Our method can also quantify how perceptual changes in attributes affect face identity distances reported by these models. Our large synthetic dataset, consisting of 48,000 synthetic face image pairs (10,200 unique synthetic faces) and 555,000 human annotations (individual attributes and pairwise identity comparisons) is available to researchers in this important area.

READ FULL TEXT

page 3

page 6

page 13

page 16

page 17

page 19

page 22

page 23

research
07/13/2020

Towards causal benchmarking of bias in face analysis algorithms

Measuring algorithmic bias is crucial both to assess algorithmic fairnes...
research
05/31/2023

F?D: On understanding the role of deep feature spaces on face generation evaluation

Perceptual metrics, like the Fréchet Inception Distance (FID), are widel...
research
05/02/2021

An Examination of Fairness of AI Models for Deepfake Detection

Recent studies have demonstrated that deep learning models can discrimin...
research
01/28/2019

Cross-Domain Image Manipulation by Demonstration

In this work we propose a model that can manipulate individual visual at...
research
04/06/2023

Data AUDIT: Identifying Attribute Utility- and Detectability-Induced Bias in Task Models

To safely deploy deep learning-based computer vision models for computer...
research
10/15/2020

Quantifying the Extent to Which Race and Gender Features Determine Identity in Commercial Face Recognition Algorithms

Human face features can be used to determine individual identity as well...
research
07/31/2022

Neural Correlates of Face Familiarity Perception

In the domain of face recognition, there exists a puzzling timing discre...

Please sign up or login with your details

Forgot password? Click here to reset