DebFace: De-biasing Face Recognition

by   Sixue Gong, et al.
Michigan State University

We address the problem of bias in automated face recognition algorithms, where errors are consistently lower on certain cohorts belonging to specific demographic groups. We present a novel de-biasing adversarial network that learns to extract disentangled feature representations for both unbiased face recognition and demographics estimation. The proposed network consists of one identity classifier and three demographic classifiers (for gender, age, and race) that are trained to distinguish identity and demographic attributes, respectively. Adversarial learning is adopted to minimize correlation among feature factors so as to abate bias influence from other factors. We also design a new scheme to combine demographics with identity features to strengthen robustness of face representation in different demographic groups. The experimental results show that our approach is able to reduce bias in face recognition as well as demographics estimation while achieving state-of-the-art performance.



page 1

page 6

page 8


An adversarial learning algorithm for mitigating gender bias in face recognition

State-of-the-art face recognition networks implicitly encode gender info...

Mitigating Face Recognition Bias via Group Adaptive Classifier

Face recognition is known to exhibit bias - subjects in certain demograp...

Learning Fair Face Representation With Progressive Cross Transformer

Face recognition (FR) has made extraordinary progress owing to the advan...

Investigating Bias in Deep Face Analysis: The KANFace Dataset and Empirical Study

Deep learning-based methods have pushed the limits of the state-of-the-a...

Fairness Properties of Face Recognition and Obfuscation Systems

The proliferation of automated facial recognition in various commercial ...

A Deep Dive into Dataset Imbalance and Bias in Face Identification

As the deployment of automated face recognition (FR) systems proliferate...

Unravelling the Effect of Image Distortions for Biased Prediction of Pre-trained Face Recognition Models

Identifying and mitigating bias in deep learning algorithms has gained s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Automated face recognition has achieved remarkable success with the rapid developments of deep learning algorithms. Despite the improvement in the accuracy of face recognition, one topic is of significance. Does a face recognition system perform equally well on different demographic groups? In fact, it has been observed that many face recognition systems have lower performance for certain demographic groups than others 

[21, 27]. Such face recognition systems are said to be biased in terms of demographics.

At the time when face recognition systems are being deployed in real world for societal benefit, this type of bias 111

This is different from the notion of machine learning bias to mean “any basis for choosing one generalization [hypothesis] over another, other than strict consistency with the observed training instances” 

[13]. is not acceptable. Why does the bias problem exist in face recognition systems? First of all, state-of-the-art (SOTA) face recognition methods are based on deep learning which requires a large collection of face images for training. Inevitably the distribution of training data has a great impact on the performance of the resultant deep learning models. It is well understood that face datasets exhibit imbalanced demographic distributions where the number of faces in each cohort is unequal. Previous studies have shown that models trained with imbalanced datasets lead to biased discrimination [4, 46]

. Secondly, the goal of deep face recognition is to map the input face image to a target feature vector with high discriminative power. The bias in the mapping function will result in feature vectors of the specific demographics with lower discriminative ability. Klare 

et al[27] shows the errors that are inherent to some demographics by studying non-trainable face recognition algorithms.

To address the bias issue, data re-sampling methods have been exploited to balance the data distribution by under-sampling the majority [14] or over-sampling the minority classes [7, 36]. Despite its simplicity, valuable information may be removed by under-sampling, and over-sampling may introduce noisy samples. Another common option for imbalanced data training is cost-sensitive learning that (i) assigns weights for different classes, (ii) samples based on their frequency [22] or the effective number of samples [5, 10]

. To eschew the overfitting of Deep Neural Network (DNN) to minority classes, hinge loss is often used to train classifiers that increase margins among classification decision boundaries 

[19, 25]. The aforementioned methods have also been adopted for face recognition and attribute prediction on imbalanced datasets [23, 53]. However, such face recognition studies only concern bias in terms of identity, rather than our focus of demographic bias.

In this paper, we propose a framework to address the influence of demographic bias on face recognition performance. In typical deep learning based face recognition frameworks, face feature encoders are trained on ample amounts of face data to generate a feature representation for each image. The large capacity of DNN enables the face representations to embed demographic details, including gender, race, and age [2, 15]. Thus, the biased demographic information is transmitted from the training dataset to the output representations. To tackle this issue, we assume that if face representation does not carry discriminative information of demographic attributes, it would be unbiased in terms of demographics.

Given this assumption, one common way to remove demographic information from face representations is to perform feature disentanglement via adversarial learning (Fig. 1(b)). That is, the classifier of demographic attributes can be used to encourage the identity representation to not carry demographic information. However, one issue of this common approach is that, the demographic classifier itself could be bias (e.g., the race classifier could be biased on gender), and hence it will act differently while disentangling faces of different cohorts. This is clearly undesired as it leads to demographic biased identity representation.

To resolve the chicken and egg problem, we propose to jointly learn unbiased representations for both the identity and demographic attributes. Specifically, starting from a multi-task learning framework that learns disentangled feature representations of gender, age, race, and identity, respectively, we request the classifiers of each task to act as adversarial supervision for the other tasks (e.g., the dash arrows in Fig. 1(c)). These four classifiers help each other to achieve better feature disentanglement, resulting in unbiased feature representations for both the identity and demographic attributes. As shown in Fig. 2, our proposed framework is novel and in sharp contrast to prior works in either multi-task learning or adversarial learning.

(a) Multi-task learning
(b) Adversarial learning
(c) DebFace
Figure 2: Methods to learn different tasks simultaneously. Solid lines are typical feature flow in CNN, while dash lines are adversarial losses.

Moreover, since the features are disentangled into the demographic and identity, our face representations also contribute to privacy-preserving applications. It is worth noticing that such identity representations contain little demographic information, which could undermine the recognition competence since demographic features are part of identity-related facial appearance. To retain the performance on demographic biased face datasets, we propose another network that combines the demographic features with the demographic-free identity features to generate a new identity representation for face recognition.

The key contributions and findings of the paper are:

A thorough analysis of deep learning based face recognition performance on three different demographics: (i) gender, (ii) age, and (iii) race.

A de-biasing face recognition framework, called DebFace, that generates disentangled representations for both identity and demographics recognition while jointly removing discriminative information from other counterparts.

The identity representation obtained from the de-biasing network (DebFace-ID) shows lower bias on different demographic cohorts and also achieves SOTA face verification results on the cross-age face recognition and race-unbiased face recognition.

The demographic estimations through DebFace are less biased across different.

Combine ID with demographics to obtain robust features for face recognition on biased datasets.

2 Related Work

Face Recognition on Imbalanced Training Data

Previous efforts on face recognition aim to tackle the class imbalance problem on training data. For example, in prior-DNN era, Zhang et al[59]

propose a cost-sensitive learning framework to reduce misclassification rate of face identification. To correct the skew of separating hyperplanes of SVM on imbalanced data, Liu 

et al[31] propose Margin-Based Adaptive Fuzzy SVM that obtains a lower generalization error bound. In the DNN era, face recognition models are trained on large-scale face datasets with highly-imbalanced class distribution. Range Loss [58]

learns a robust face representation that makes the most use of every training sample. To mitigate the impact of insufficient class samples, center-based feature transfer learning 

[56] and large margin feature augmentation [53] are proposed to augment features of minority identities and equalize class distribution. Huang et al[23] propose cluster-based large margin local embedding that reduces local data imbalance. Despite their effectiveness, these studies ignore the influence of demographic imbalance issue on the face dataset, which may lead to demographic bias. For instance, both [21] and [27] show that face recognition algorithms consistently perform worse on certain demographic cohorts. To uncover deep learning bias, Alexander et al[3] develop an algorithm to mitigate the hidden biases within training data. To our knowledge, no studies have tackled the challenge of de-biasing DNN-based face recognition algorithms.

Adversarial Learning and Disentangled Representation

Adversarial learning [41]

has been well explored in many computer vision applications. For example, Generative Adversarial Networks (GANs) 

[16] employ adversarial learning to train a generator by competing with a discriminator that distinguishes real images from synthetic ones. Adversarial learning has also been applied to domain adaptation problems [48, 49, 33, 45]

. A problem of current interest is to learn interpretable representations with semantic meaning. There have been many studies that learn factors of variations in the data by supervised learning 

[29, 30]

, or semi-supervised/unsupervised learning 

[26, 37, 32], referred as disentangled representation. For supervised disentangled feature learning, adversarial networks are utilized to extract features that only contain discriminative information of a target task. For face recognition, Liu et al[30]

propose a disentangled representation by training an adversarial autoencoder to extract features that can capture identity discrimination and its complementary knowledge. In contrast, our proposed DebFace differs prior works in that the each branch of a multi-task network act as both a generator and discriminators of other branches (Fig. 


3 Methodology

3.1 Problem Definition

Figure 3: Overview of the proposed the De-biasing face network. The dashed arrows represent adversarial training.

The concept of unbiased face recognition is that given a face recognition system, equal performances can be achieved in different categories of face images. Despite the research on pose-invariant face recognition that aims for equal performance on all poses, we believe that it is inappropriate to define variations like pose, illumination, or resolution, as the categories. These are instantaneous image-related variations with intrinsic bias. E.g., large pose or low resolution faces are inherently harder to be recognized.

Rather, we would like to define subject-related properties such as demographic attributes as the categories. A face recognition system is biased if it performs worse on certain demographic cohorts. For practical applications, it is important to consider what demographic biases may exist, and whether these are intrinsic biases across demographic cohorts or algorithmic biases derived from the algorithm itself. This motivates us to analyze the demographic influence on face recognition performance and strive to reduce algorithmic bias for face recognition systems. We aim to learn a face representation that carries equal discriminative information across demographic cohorts. One may achieve this by training on a dataset containing uniform samples over the cohort space. However, the demographic distribution of a dataset is often imbalanced that under-represents demographic minorities while over-represents majorities. Naively re-sampling training data may still induce bias since the diversities of latent variables are different across cohorts and the instances cannot be treated fairly during training. To mitigate demographic bias, we propose a face de-biasing framework that jointly reduces mutual bias over all demographics and identities while disentangles face representations into gender, age, race, and demographic-free identity in the mean time.

3.2 Algorithm Design

The proposed network takes advantage of the relationship between demographics and face identities. On one hand, demographic characteristics are highly correlated to face features. Some demographic attributes, e.g., gender and race, are two of the factors that determine facial appearances and can provide identification-related information. On the other hand, demographic attributes are heterogeneous in terms of data type and semantics [18]. Individual attributes like race are fixed while age or gender may change individually over time. Meanwhile, the three demographic attributes are semantically independent. A male person, for example, is not necessary to be a certain age or of a certain race. Accordingly, we present a framework that jointly generates demographic features and identity features from a single face image by considering both the aforementioned attribute correlation and attribute heterogeneity in a DNN.

While our goal is to diminish demographic bias from face representation, we observe that demographic estimations are biased as well (see Fig. 8). How can we remove the bias of face recognition when demographic estimations themselves are biased? To increase fairness of all demographic classifiers and decrease bias of both face recognition and demographic estimations, we propose a de-biasing network, DebFace, that disentangles the representation into gender, age, race, and identity (DebFace-ID), respectively. Using adversarial learning, the proposed method is capable of jointly learning multiple discriminative representations while ensuring that each classifier cannot distinguish among classes through non-corresponding representations.

Though less biased, DebFace-ID loses demographic cues that is useful for identification. In particular, race and gender are two critical components that constitute face patterns. Hence, we desire to incorporate race and gender with DebFace-ID to obtain a more integrated face representation. We employ a light-weight fully-connected network that is trained to aggregate the representations into a face representation with the same dimensionality as DebFace-ID.

3.3 Network Architecture

Figure 3 gives an overview of the proposed de-biasing face recognition network. It consists of four components, namely, the shared image-to-feature encoder , the four attribute classifiers (including gender , age , race , and identity ), the distribution classifier , and the feature aggregation network .

We assume access to labeled training samples . Our approach takes an image as the input of . The encoder projects to its feature representation with dimensionality. The feature representation is then decoupled into four -dimensional feature vectors, gender , age , race , and DebFace-ID , respectively. Next, each attribute classifier operates the corresponding feature vector to correctly classify the target attribute by optimizing parameters of both and the respective classifier .

For a demographic attribute with

categories, the learning objective is the standard cross entropy loss function

, where is an index function, , , and . For the identity classification, we adopt AM-Softmax [50] as the objective function , where is the feature scale, and is the angular margin.

To de-bias all of the feature representations, adversarial loss is applied to the above four classifiers such that each of them will not be able to predict correct labels when operating irrelevant feature vectors. Specifically, given a classifier, the remaining three attribute feature vectors are imposed on it and attempt to mislead the classifier by only optimizing the representation parameters of . To further improve the disentanglement, we also reduce the mutual information among the attribute features by introducing a distribution classifier .

is trained to identify whether an input representation is sampled from the joint distribution

or the multiplication of margin distributions via a binary cross entropy loss , where is the distribution label. Similar to adversarial loss, a factorization objective function is utilized to restrain the from distinguishing the real distribution and thus minimizes the mutual information of the four attribute representations. Both adversarial loss and factorization loss are described in more details in Sec. 3.4.

Altogether, the proposed de-biasing face network endeavors to minimize the joint loss function:


where and are hyper-parameters determining how completely the representation is decomposed and decorrelated in each training iteration.

The discriminative demographic features in DebFace-ID are weakened by removing demographic information. Fortunately, our de-biasing network preserves all pertinent demographic features in a disentangled way. Basically, we train another multilayer perceptron (MLP)

to aggregate DebFace-ID and the demographic embeddings into a unified face representation DemoID. Since age generally does not pertain to a person’s identity, we only consider gender and race as the identity-informative attributes. The aggregated embedding, , is supervised by an identity-based triplet loss , where is the number of hard triplets in a mini-batch, and is the triplet consisting of an anchor, a positive, and a negative DemoID representation. , and is the margin.

3.4 Adversarial Training and Disentanglement

As discussed in Sec. 3.3, the adversarial loss aims to minimize the task-independent information semantically, while the factorization loss strives to dwindle the interfering information statistically. We employ both losses to disentangle the representation extracted by .

We introduce the adversarial loss as a means to learn a representation that is invariant in terms of certain attributes, which mitigates bias related to those attributes. Such a representation is invariant if a classifier trained on it cannot correctly classify the categories of the attribute using that representation. We take one of the attributes, e.g., gender, as an example to illustrate the adversarial objective. First of all, for a demographic representation , we learn a gender classifier on by optimizing the classification loss

. Secondly, for the same gender classifier, we intend to maximize the chaos of the predicted distribution. It is well known that a uniform distribution has the highest entropy and presents the most randomness. Hence, we train the classifier to predict the probability distribution as close as possible to a uniform distribution over the category space by minimizing the cross entropy

, where is the number of categories in gender 222In our case, , i.e., male and female., and the ground-truth label is no longer an one-hot vector, but a -dimensional vector with all elements being . The above loss function strives for gender-invariance by finding a representation that makes the gender classifier perform poorly. To this end, we minimize the adversarial loss by only updating parameters in .

We further decorrelate the representations by reducing the mutual information across attributes. By definition, the mutual information is the relative entropy (KL divergence) between the joint distribution and the product distribution. To increase uncorrelation, we add a distribution classifier that is trained to simply perform a binary classification using on samples from both the joint distribution and dot product distribution. Similar to adversarial learning, we factorize the representations by tricking the classifier via the same samples so that the predictions are close to random guesses . In each mini-batch, we consider as samples of the joint distribution . We then randomly shuffle the feature vectors of each attribute in a batch, and re-concatenate them into -dimensional vectors, which are approximated as samples of the product distribution . During factorization, we only update to learn decomposed representations with minimum mutual information.

4 Experiments

4.1 Datasets and Pre-processing

Datasets: We utilize face datasets in this work, for learning the demographic estimation models, the baseline face recognition model, the de-biasing face model as well as for evaluating these models. To be specific, CACD [8], IMDB [40], UTKFace [60], AgeDB [35], AFAD [38], AAF [9], FG-NET 333, RFW [52], IMFDB-CVIT [42], Asian-DeepGlint [1], and PCSO [11] are the datasets for training and testing models of demographic estimations; and the datasets for learning and evaluating models of face verification are MS-Celeb-1M [17], LFW [24], IJB-A [28], and IJB-C [34].

Pre-Processing: All face images are detected by MTCNN [57]. Each face is cropped and resized to pixels using a similarity transformation based on the detected five landmarks.

4.2 Implementation Details

We train the proposed de-biasing network on a cleaned version of MS-Celeb-1M [12], using the ArcFace architecture [12] with layers for the encoder . Since there is no demographic labels in MS-Celeb-1M, we first train three demographic estimation models for gender, age, and race, respectively. For age estimation, the model is trained on the combination of CACD, IMDB, UTKFace, AgeDB, AFAD, and AAF datasets. The gender estimation model is trained on the same datasets except CACD which contains no gender labels. We combine AFAD, RFW, IMFDB-CVIT, and PCSO for race estimation training. All the demographic models use ResNet [20] with layers for age, layers for gender and race.

We predict the demographic labels of MS-Celeb-1M with the well-trained demographic models. Our DebFace is then trained on the re-labeled MS-Celeb-1M using SGD with a momentum of , a weight decay of , and a batch side of . The learning rate starts from and drops to following the schedule at , , and epochs. The model is trained for epochs. The dimensionality of the embedding layer of is so that each attribute representation (gender, age, race, ID) is a -dim vector. We keep the hyper-parameter setting of AM-Softmax as [12]: and . The feature aggregation network

comprises of two linear residual units with P-ReLU and BatchNorm in between.

is trained on MS-Celeb-1M by SGD with a learning rate of . The triplet loss margin is . The disentangled features of gender, race, and DebFace-ID are concatenated into a -dim vector, which is the input of . The network is then trained to output a -dim feature representation for face recognition on biased datasets.

4.3 De-biasing Face Verification

Baseline: We compare DebFace with a regular face representation model which has the same architecture as the shared feature encoder of DebFace. Referred as BaseFace, this baseline model is also trained on MS-Celeb-1M, with the representation dimension of .

To show the efficacy of DebFace on bias mitigation in face recognition, we evaluate the verification performance of both DebFace and BaseFace on faces from each demographic cohort separately. There are

total cohorts given the combination of demographic attributes including gender (male, female), race (Black, White, East Asian, Indian), and age group (0-12, 13-18, 19-34, 35-44, 45-54, 55-100). We combine IMDB, CACD, AgeDB, and CVIT as the testing set. Overlapping identities among these datasets are removed. Pre-defining a False Accept Rate (FAR) and comparing the corresponding True Accept Rate (TAR) may be biased due to the limited number of images in minority classes. Besides, the thresholds derived from FAR are susceptible to errors of the identity labels, especially to minorities. Therefore, we report the Area Under the Curve (AUC) - Receiver Operating Characteristics (ROC) that involves FAR from zero to one for each demographic group. We define the degree of bias, termed biasness, as the standard deviation of performance across cohorts.

(a) BaseFace
(b) DebFace
Figure 4: Face Verification AUC in each demographic cohort. The cohorts are chosen based on the three attributes, i.e., gender, age, and race. To fit the results into a D plot, we show the performance of male and female separately. Due to the limited number of face images in some cohorts, their results are gray cells. The biasness of BaseFace and DebFace are and , respectively.

Figure 4 shows the face verification results of BaseFace and DebFace on each cohort. That is, for a particular face representation (e.g., DebFace), we report its AUC on each cohort within that demographic and put the number in the corresponding cell. For example, on the female heatmap, the first cell represents the performance of BaseFace on faces of white female, aging from to . From these heatmaps, we can observe that both DebFace and BaseFace present the bias issue in face verification, where the performance in some cohorts are significantly worse than others, especially the cohort of black children and elder people. Compared to BaseFace, DebFace suggests less bias and the difference of AUC on the cohorts is smaller, where the heatmap exhibits smoother edges. Note that the overall performance of DebFace declines compared to BaseFace. This is because part of the identity-related information like gender and race is disentangled from identity so that the discriminativeness of Debface-ID deteriorates.

(a) Gender
(b) Age
(c) Race
Figure 5: The overall performance of face verification AUC on the gender, age, and race, respectively. The biasness of BaseFace and DebFace on gender is and ; and on age; and on race.

Figure 5 shows the performance of face verification on cohorts based on three demographic categories. Both DebFace and BaseFace present similar relative accuracies across cohorts. For example, both algorithms performs worse on the children cohort than the adults; and the performance on the Indian cohort is significantly higher than the other races. DebFace decreases the bias from demographics by gaining discriminative features of minorities in spite of the reduction in the performance of majorities.

(a) Gender
(b) Race
Figure 6: Face Verification AUC in each demographic cohort. The comparison is between the finetuned BaseFace and DebFace. The biasness of Finetune and DebFace on gender is and ; and on race
(a) Age 0-12
(b) Age 13-18
(c) Age 19-34
(d) Age 35-44
(e) Age 45-54
(f) Age 55-100
(g) Gender Female
(h) Gender Male
(i) Race White
(j) Race Black
(k) Race East Asian
(l) Race Indian
Figure 7: Feature distribution of the representation output by BaseFace and DebFace.

To further demonstrate the intrinsic bias in different cohorts, we also finetune BaseFace using face images that only belong to a specific cohort. Since age is not informative in terms of identity, we only finetune BaseFace on six cohorts of gender and race separately. Figure 6 shows the performance of the finetuned models versus DebFace. Compared to BaseFace, the AUC increases on most of the cohorts by finetuning except female. However, there are still bias even after finetuning on each cohort. Our DebFace cannot do no better than finetuned models in terms of de-biasing the race influence. For gender groups, on the other hand, bias between male and female increases by finetuning, suggesting that the de-biasing factors in DebFace are capable of mitigating the gender bias in face verification.

4.4 De-biasing Demographic Estimation

Baseline: We further explore the bias of demographic estimation and compare DebFace with baseline estimation models. We train three demographic estimation models, namely, gender estimation (BaseGender), age estimation (BaseAge), and race estimation (BaseRace). For fairness, all three models have the same architecture and training dataset as the shared layers of DebFace. All the demographic estimations are mapped as classification problems, so classification accuracy is used as the performance metric.

We combine the four datasets mentioned in Sec. 4.3 with Asian-DeepGlint as the global testing set. Note that not all of the datasets include labels of all three demographics. Thus, we again employ the demographic models that were trained to label MS-Celeb-1M. For the dataset without certain demographic labels, we simply use the corresponding model to predict the labels.

(a) BaseGender
(b) BaseRace
(c) BaseAge
(d) DebFace-Gender
(e) DebFace-Race
(f) DebFace-Age
Figure 8: Classification accuracy of the demographic estimations on faces of different cohorts, for the baseline models and DebFace. The biasness of the baseline model and DebFace is and on gender; and on race; and on age.

As shown in Fig. 8, all demographic estimations present significant bias. For gender estimation, both algorithms perform worse on the White and Black cohorts than the East Asian and Indian cohorts. In addition, the performance on young children is significantly worse than adults. In general, the race estimation models perform better on the male cohort than female. Compared to gender, race estimation shows higher bias in terms of age cohorts. Both the baseline method and DebFace perform worse on cohorts with age between to than other age groups. Similar to race, age estimation still achieves better performance on the male cohort than female. Moreover, the white cohort shows dominant advantages over other races in age estimation. In spite of the existing bias in demographic estimations, the proposed DebFace is still able to diminish the bias derived from algorithms. Compared to Fig. 7(a),  7(b),  7(c), cells in Fig. 7(d),  7(e),  7(f) present more uniform colors.

Method LFW (%) Method IJB-A (%)
DeepFace+ [44] DR-GAN [47]
CosFace [51] Yin et al[55]
L2-Face [39] Cao et al[6]
ArcFace [12] Multicolumn [54]
PFE [43] PFE [43]
BaseFace BaseFace
DebFace DebFace
DemoID DemoID
Table 1: footnotesizePerformance on LFW and IJB-A, with verification accuracy on LFW and TAR@ FAR on IJB-A.

4.5 Face Verification on Public Protocols

We compare the face verification performance of the proposed method with SOTA methods, on three public benchmarks: LFW, IJB-A, and IJB-C. All three datasets exhibit imbalanced data distribution in terms of demographics.

Ablations: We report the performance of three different settings, using 1) BaseFace, the same baseline in Sec. 4.3, 2) the ID representation output by DebFace, and 3) the fused representation DemoID.

As shown in Tabs. 12, the ID representation of DebFace is less discriminative than BaseFace, or DemoID, since race and gender are essential components of identity-related face features. Thus, the performance improves by simply concatenating race and gender features with DebFace-ID. On the other hand, re-introducing race and gender features to the face representation through the aggregation model may inevitably lead to demographic bias. In the sense of de-biasing, it is preferable to concatenate race and gender directly with the de-biased ID. However, if we prefer to maintain the overall performance across all demographics, we can still aggregate all the relevant information. It is an application-dependent trade-off between accuracy and de-biasing. Fortunately our algorithm design offers the flexibility in handling this trade-off.

Method TAR @ FAR (%)
0.001% 0.01% 0.1%
Yin et al[55] - - 69.3
Cao et al[6]
Multicolumn [54]
PFE [43]
Table 2: Verification performance on IJB-C.

4.6 Qualitative Analysis of Disentanglement

To demonstrate the feature disentanglement by DebFace, we plot the distribution of the nearest neighbors of the face images in the feature space. For example, Fig. 6(g) illustrates the gender distribution of the nearest neighbors of all the female faces in the dataset. In the feature space of DebFace, there are points that are nearest to the females faces belong to the female cohorts, and points belong to the male cohorts. As shown in Fig. 7, the DebFace representation presents more uniform distribution compared to BaseFace, indicating that faces within different demographic groups are converged together and the demographic information is disentangled from the face representation.

5 Conclusion

We present a de-biasing face recognition network (DebFace) to mitigate demographic bias in face recognition. DebFace adversarially learns the disentangled representation for gender, race, and age estimation, and face recognition simultaneously. We empirically demonstrate that not only DebFace can reduce bias in face recognition but in demographic estimation as well. Our future work will explore an aggregation scheme to combine race, gender, and identity without introducing algorithmic and dataset bias.


  • [1]
  • [2] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, pages 308–318, 2016.
  • [3] Alexander Amini, Ava Soleimany, Wilko Schwarting, Sangeeta Bhatia, and Daniela Rus. Uncovering and mitigating algorithmic bias through learned latent structure. AAAI/ACM Conference on AI, Ethics, and Society, 2019.
  • [4] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems, pages 4349–4357, 2016.
  • [5] Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. arXiv preprint arXiv:1906.07413, 2019.
  • [6] Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age. In IEEE International Conference on Automatic Face & Gesture Recognition. IEEE, 2018.
  • [7] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique.

    Journal of Artificial Intelligence research

    , 16:321–357, 2002.
  • [8] Bor-Chun Chen, Chu-Song Chen, and Winston H Hsu. Cross-age reference coding for age-invariant face recognition and retrieval. In ECCV, 2014.
  • [9] Jingchun Cheng, Yali Li, Jilong Wang, Le Yu, and Shengjin Wang. Exploiting effective facial patches for robust gender recognition. Tsinghua Science and Technology, 24(3):333–345, 2019.
  • [10] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In CVPR, 2019.
  • [11] Debayan Deb, Lacey Best-Rowden, and Anil K Jain. Face recognition performance under aging. In CVPRW, 2017.
  • [12] Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In CVPR, 2019.
  • [13] Thomas G Dietterich and Eun Bae Kong.

    Machine learning bias, statistical bias, and statistical variance of decision tree algorithms.

    Technical report, 1995.
  • [14] Chris Drummond, Robert C Holte, et al. C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on Learning from Imbalanced Datasets II. Citeseer, 2003.
  • [15] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In the 22nd ACM SIGSAC, 2015.
  • [16] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
  • [17] Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In ECCV. Springer, 2016.
  • [18] H. Han, K Jain A, S. Shan, and X. Chen. Heterogeneous face attribute estimation: A deep multi-task learning approach. IEEE Trans. Pattern Analysis Machine Intelligence, PP(99):1–1, 2017.
  • [19] Munawar Hayat, Salman Khan, Waqas Zamir, Jianbing Shen, and Ling Shao. Max-margin class imbalanced learning with gaussian affinity. arXiv preprint arXiv:1901.07711, 2019.
  • [20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
  • [21] J Howard, Y Sirotin, and A Vemury. The effect of broad and specific demographic homogeneity on the imposter distributions and false match rates in face recognition algorithm performance. In IEEE BTAS, 2019.
  • [22] Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning deep representation for imbalanced classification. In CVPR, 2016.
  • [23] Chen Huang, Yining Li, Change Loy Chen, and Xiaoou Tang. Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans. Pattern Analysis and Machine Intelligence, 2019.
  • [24] Gary B Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. 2008.
  • [25] Salman Khan, Munawar Hayat, Syed Waqas Zamir, Jianbing Shen, and Ling Shao. Striking the right balance with uncertainty. In CVPR, 2019.
  • [26] Hyunjik Kim and Andriy Mnih. Disentangling by factorising. arXiv preprint arXiv:1802.05983, 2018.
  • [27] Brendan F Klare, Mark J Burge, Joshua C Klontz, Richard W Vorder Bruegge, and Anil K Jain. Face recognition performance: Role of demographic information. IEEE Trans. Information Forensics and Security, 7(6):1789–1801, 2012.
  • [28] Brendan F Klare, Ben Klein, Emma Taborsky, Austin Blanton, Jordan Cheney, Kristen Allen, Patrick Grother, Alan Mah, and Anil K Jain. Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In CVPR, 2015.
  • [29] Yang Liu, Zhaowen Wang, Hailin Jin, and Ian Wassell. Multi-task adversarial network for disentangled feature learning. In CVPR, 2018.
  • [30] Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng, Junjie Yan, and Xiaogang Wang. Exploring disentangled feature representation beyond face identification. In CVPR, 2018.
  • [31] Yi-Hung Liu and Yen-Ting Chen.

    Face recognition using total margin-based adaptive fuzzy support vector machines.

    IEEE Transactions on Neural Networks, 18(1):178–192, 2007.
  • [32] Francesco Locatello, Stefan Bauer, Mario Lucic, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. arXiv preprint arXiv:1811.12359, 2018.
  • [33] Mingsheng Long, Zhangjie Cao, Jianmin Wang, and Michael I Jordan. Conditional adversarial domain adaptation. In NIPS, 2018.
  • [34] Brianna Maze, Jocelyn Adams, James A Duncan, Nathan Kalka, Tim Miller, Charles Otto, Anil K Jain, W Tyler Niggel, Janet Anderson, Jordan Cheney, et al. Iarpa janus benchmark-c: Face dataset and protocol. In 2018 ICB, 2018.
  • [35] Stylianos Moschoglou, Athanasios Papaioannou, Christos Sagonas, Jiankang Deng, Irene Kotsia, and Stefanos Zafeiriou. Agedb: the first manually collected, in-the-wild age database. In CVPRW, 2017.
  • [36] Sankha Subhra Mullick, Shounak Datta, and Swagatam Das. Generative adversarial minority oversampling. arXiv preprint arXiv:1903.09730, 2019.
  • [37] Siddharth Narayanaswamy, T Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Noah Goodman, Pushmeet Kohli, Frank Wood, and Philip Torr. Learning disentangled representations with semi-supervised deep generative models. In NIPS, 2017.
  • [38] Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, and Gang Hua. Ordinal regression with multiple output cnn for age estimation. In CVPR, 2016.
  • [39] Rajeev Ranjan, Carlos D Castillo, and Rama Chellappa. L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507, 2017.
  • [40] Rasmus Rothe, Radu Timofte, and Luc Van Gool. Deep expectation of real and apparent age from a single image without facial landmarks. IJCV, 2018.
  • [41] Jürgen Schmidhuber. Learning factorial codes by predictability minimization. Neural Computation, 4(6):863–879, 1992.
  • [42] Parisa Beham Jyothi Gudavalli Menaka Kandasamy Radhesyam Vaddi Vidyagouri Hemadri J C Karure Raja Raju Rajan Vijay Kumar Shankar Setty, Moula Husain and C V Jawahar. Indian Movie Face Database: A Benchmark for Face Recognition Under Wide Variations. In NCVPRIPG, 2013.
  • [43] Yichun Shi, Anil K Jain, and Nathan D Kalka. Probabilistic face embeddings. arXiv preprint arXiv:1904.09658, 2019.
  • [44] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014.
  • [45] Chaofan Tao, Fengmao Lv, Lixin Duan, and Min Wu. Minimax entropy network: Learning category-invariant features for domain adaptation. arXiv preprint arXiv:1904.09601, 2019.
  • [46] Antonio Torralba, Alexei A Efros, et al. Unbiased look at dataset bias. In CVPR, 2011.
  • [47] Luan Tran, Xi Yin, and Xiaoming Liu. Disentangled representation learning gan for pose-invariant face recognition. In CVPR, 2017.
  • [48] Eric Tzeng, Judy Hoffman, Trevor Darrell, and Kate Saenko. Simultaneous deep transfer across domains and tasks. In CVPR, 2015.
  • [49] Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In CVPR, 2017.
  • [50] Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. Additive margin softmax for face verification. IEEE Signal Processing Letters, 25(7):926–930, 2018.
  • [51] Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition. In CVPR, 2018.
  • [52] Mei Wang, Weihong Deng, Jiani Hu, Jianteng Peng, Xunqiang Tao, and Yaohai Huang. Racial faces in-the-wild: Reducing racial bias by deep unsupervised domain adaptation. arXiv preprint arXiv:1812.00194, 2018.
  • [53] Pingyu Wang, Fei Su, Zhicheng Zhao, Yandong Guo, Yanyun Zhao, and Bojin Zhuang. Deep class-skewed learning for face recognition. Neurocomputing, 2019.
  • [54] Weidi Xie and Andrew Zisserman. Multicolumn networks for face recognition. arXiv preprint arXiv:1807.09192, 2018.
  • [55] Xi Yin and Xiaoming Liu.

    Multi-task convolutional neural network for pose-invariant face recognition.

    IEEE Trans. Image Processing, 27(2):964–975, 2017.
  • [56] Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker. Feature transfer learning for face recognition with under-represented data. In CVPR, 2019.
  • [57] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503, 2016.
  • [58] Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, and Yu Qiao. Range loss for deep face recognition with long-tailed training data. In CVPR, 2017.
  • [59] Yin Zhang and Zhi-Hua Zhou. Cost-sensitive face recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 32(10):1758–1769, 2009.
  • [60] Zhifei Zhang, Yang Song, and Hairong Qi. Age progression/regression by conditional adversarial autoencoder. In CVPR. IEEE, 2017.