Exploring Racial Bias within Face Recognition via per-subject Adversarially-Enabled Data Augmentation

04/19/2020 ∙ by Seyma Yucer, et al. ∙ 10

Whilst face recognition applications are becoming increasingly prevalent within our daily lives, leading approaches in the field still suffer from performance bias to the detriment of some racial profiles within society. In this study, we propose a novel adversarial derived data augmentation methodology that aims to enable dataset balance at a per-subject level via the use of image-to-image transformation for the transfer of sensitive racial characteristic facial features. Our aim is to automatically construct a synthesised dataset by transforming facial images across varying racial domains, while still preserving identity-related features, such that racially dependant features subsequently become irrelevant within the determination of subject identity. We construct our experiments on three significant face recognition variants: Softmax, CosFace and ArcFace loss over a common convolutional neural network backbone. In a side-by-side comparison, we show the positive impact our proposed technique can have on the recognition performance for (racial) minority groups within an originally imbalanced training dataset by reducing the pre-race variance in performance.



There are no comments yet.


page 1

page 2

page 3

page 5

page 6

page 7

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Numerous machine learning applications utilising facial attributes have proliferated in recent years as autonomous decision-making processes have become widely adopted by companies and governments

[27]. A growing number of applications based on face analyses for surveillance [4], recruitment [15], and health-care [38] have increasingly become integrated into our daily lives.

However, the generalisation of such research and applications is problematic due to the prevalence of bias occurrences within face recognition [33]. The imbalance in specific demographic groups occurring with varying geographic locale globally, including race, age or gender, poses a challenge of transparent explanations and solutions for facial recognition applications. Hence, to cope with real-world diversity, it is crucial to have a profound understanding of this bias within every aspect [6].

Figure 1: Racial transformation example using [47]. We transfer an African image to Asian image and obtain sythesised in Asian domain and we reconstruct from image. Asian image to African image transformation follows the same procedure.

Bias in machine learning has been extensively studied for decades [32, 3]. These studies provide the fundamental understanding of the underlying reasons for face recognition bias which has also seen a surge of interest in recent years [10, 6]. Studies have addressed this problem in various perspectives such as data pre-processing [44, 1, 34], and adversarial training [24, 11, 42].

Meanwhile, recent advances in Generative Adversarial Networks (GAN), have led to realistic image generation [22] and even class generation [1]. Such advances in the field have a promising potential to overcome the bias in face recognition via realistic image generation as most of the face recognition datasets have a significantly imbalance distribution on either classes [17] or demographic groups [19].

In this study, we address the racial bias of face recognition from an adversarial augmentation point of view. As most of the datasets [40, 41, 6] consist of four major racial groups, namely African, Asian, Caucasian and Indian, we seek group-fairness among these races, in terms of facial recognition performance, by utilising generative adversarial network (GAN) [12].

Previous work [24, 11, 42] has established adversarial techniques to minimise mutual information on identity features, which reveal sensitive attributes about race, gender and age of the subject. However, such approaches [24, 28], have failed to effectively address the trade-off between suppressing the use of such sensitive attributes and the loss of key identity-related features which pertain to the overall performance of the facial recognition approach. Our solution, instead, uses an adversarial image re-synthesise technique [47], to transform sensitive attributes across a set of synthetic images comprising the full range of races being considered within the facial recognition problem.

By doing so, we preserve the important identity-related features whilst making the racially dependent features of the face less prevalent due to the artificially synthesised distribution of these identity characteristics across the full range of race profiles for any given individual.

Figure 1 illustrates how we transform the identity characteristics, and hence features, any given individual across multiple racial profiles using a CycleGAN [47]. It proposes transformation across racial domains and reconstruction to produce an identical image from a transformed image during the cyclic adversarial training.

To show its robustness, we explore the performance of our approach using balanced and imbalanced training datasets. The main contributions of this paper are as follows:

  • we propose an adversarial image-to-image transformation technique to mitigate racial bias based on the cyclic adversarial training approach of CycleGAN [47].

  • we illustrate both quantitative and qualitative performance of our proposed facial data augmentation techniques over established benchmark datasets within the face recognition domain, establishing a statistical paradigm for the presentation of recognition results on a per-race basis.

The rest of this paper is structured as follows: in Section 2, we review the current solutions for face recognition bias in three different categories. We present a methodology for this study in Section 3 with our experimental setup and results in Section 4 and 5, respectively. An extended discussion on adversarial face recognition bias for both balanced and imbalanced datasets is presented within Section 5 with our final conclusions subsequently presented in Section 6.

2 Related Work

Bias and fairness in machine learning have been studied in the last decade, and significant research [37, 29] draws attention to bias for different fields like face recognition, action recognition or language processing.

As one of the most prominent fields of machine learning, face recognition has been extensively used across different areas [43, 36]. As the popularity of face recognition increases, we face more bias incidents [10]. Moreover, studies [6, 31, 8] point out the bias of current face recognition web services and state-of-the-art algorithms for demographic groups such as age, gender, and race. Although definitions of demographic attributes might be uncertain, it is still important to strive for group-fairness [46].

Studies of bias in face recognition which use contemporary deep learning approaches are categorised into three main groups: pre-processing (data preparation), in-processing (model training) and post-processing (output inference) techniques.

Pre-processing Methods. Previous studies [30, 19] revealed that the public face recognition datasets have more male and lighter skin tone subjects than respectively female and darker skin tone subjects. This is due to the images within these datasets are mostly from celebrities, including sports players, actors, politicians, collected from predominantly white male subjects. In other respects, the studies of [40, 30] released balanced datasets for four racial groups; they do not provide universal race coverage nor they are not openly and readily available for access.

To obtain fair datasets, studies [21, 1, 34] propose re-sampling methods by either dropping or augmenting samples in the datasets. Downsampling can be considered as a solution for avoiding bias despite the information loss it introduces. Augmentation techniques [1, 34] for image generation have improved significantly using adversarial learning. However, the limitations, as described in [20], are still a concern for mitigating bias. Feature transformation is another pre-processing approach [45] that improves the feature space of under-represented subjects by moving the distribution of the feature space closer to the regular, supposedly unbiased distribution.

In-processing Methods. In-processing methods are divided into three groups: (i) adversarial approaches [24, 28, 11, 42], (ii) domain adaptation methods [40] and (iii) cost-sensitive learning techniques [23, 2]. Adversarial methods focus on sensitive features on the image; with [24] proposing an adversarial feature learning approach rather than learning all the feature representations from the image. In this way, it minimises mutual information between bias features and characteristic features to decrease bias influence. The experiments of [24] are relatively simplistic compared to face recognition bias. Distinguishing demographic information within an image is a serious trade-off of face recognition as demographic features (age, gender, race), and identity features overlap. Another approach in [28], addresses this problem by highlighting the difficulty of setting a demographic condition in realistic face generation. On the other hand, [11] debiases images by minimising correlation on disentangled features. Another study [42] reduces the dependence on sensitive attributes. Despite achieving state-of-the-art results on the test, there is still ample room for further understanding of bias.

A domain adaptation technique, [40], transfers the Caucasian domain to non-Caucasian domains during training but requires to have at least one source domain to transfer into others. Cost-sensitive solutions [23, 2] have been used for imbalanced learning and machine learning fairness in general. For face recognition, adaptive margin [9] or cluster large margin settings [17] are more frequently considerable since the aim is to have intra-class compactness and inter-class discrepancy for large scale datasets. Distinguishing the group features on hypersphere helps to avoid overfitting of under-represented groups. Adaptive margins [41] for each race improves the scatter of features of races.

Post-processing Methods. Post-processing studies are based on either detecting the bias or improving the fairness after training the model. For example, [25] proposes a Multiaccuracy-Boost algorithm for any machine learning algorithms to improve fairness. IBM [5] provides an extensive toolkit to detect bias and determine the current model fairness level. For broader explanations, [35, 8] give demographic bias level of current state-of-the-art face recognition algorithms.

Motivated by [47], our approach is based on adversarial image synthesise to mitigate bias. Unlike other adversarial studies [24, 42], we transform race information from one group to another for fair face recognition. We aim to augment sensitive attributes to make them irrelevant for face recognition solutions.

3 Proposed Method

We present our methodology in three parts: we first describe our problem definition in Section 3.1, explain image-to-image transfer method [47] for race transformation to mitigate face recognition bias in Section 3.2 and outline our comparator state-of-the-art face recognition algorithms [39, 9] in Section 3.3.

Figure 2: Overview of our solution in three phases: (a) describes imbalanced distribution of VGGFace2 [7] and downsampling it to VGGFace2 1200. (b) illustrates race domain transformation schema for a given image (c) shows face recognition algorithms with Softmax [26], CosFace [39] and ArcFace [9]loss functions using VGGFace2 1200 Races.

3.1 Problem Definition

In this section, we define our problem by introducing the general terms of machine learning bias. Disparate impact, as indirect discrimination, appears when there is a correlation between sensitive attributes (age, gender, race) and other attributes. It causes inequality on outcomes for different demographic groups, as observed on various machine learning applications, including face recognition web services [6].

Ideally, a machine learning algorithm should require that the conditional probability

of the output given input does not depend on any sensitive attributes which is demographic features in our case. This unawareness can be formalized as , where is an input, is the corresponding label and is a sensitive attribute that does not alter the outcome. However, removing dependency is highly challenging for face recognition due to high mutual information between facial features and sensitive attributes, like race.

For a given face image dataset, provides

number of face images. A feature embedding vector of an image,

, where , is commonly statistically dependent on sensitive attributes where it causes indirect discrimination for particular demographic groups which potentially form overlapping, subsets of . Although the common approach for face recognition bias is to minimise this mutual information to remove the dependency on sensitive features; it is still an extremely difficult task using face features without sacrificing any prior information for face recognition as shown in [24, 28].

Hence, we approach the problem from a completely different perspective by transferring sensitive attributes from one domain to another whilst simultaneously preserving prior information for recognition. On the other hand, we are aware that some features are more prevalent in some demographic groups than others. The sensitive information, in this case, may improve the prior information for the recognition task. Lighter skin allows the model to learn more detailed features given characteristics of modern cameras and common scene lumination conditions. A novel input mechanism which projects different sensitive information for one image to a model makes race modelling irrelevant. As a result, we ask a question; What if we augment and transfer sensitive information rather than removing it To answer this question, we present a new pre-processing based method requires augmentation of sensitive attributes of an image.

Our new inputs consist of three generated images from different domains for each image. Given the race domains for respectively, we aim to transform an image from one domain as an image to another domain. For instance, we transform given in to another image from different domains such as . If we use different images belonging to these domains to transform, we can define new generated input dataset as following list where is the original image and is a new input list including the original image.

Transferring sensitive information while keeping prior information of the image is possible via adversarial methods, as they are capable of generating images from the training data distribution. To show that, we propose a solution of sensitive attribute transformation while keeping prior information for face recognition and present a new augmented dataset, . In the next Section 3.2 we present our approach to the image synthesise process to obtain .

3.2 Adversarial Image-to-Image Transfer

Our solution transforms these sensitive attributes using a cyclic adversarial domain transfer approach, CycleGAN [47]. We assume that learning a mapping function between two different race groups domain reduces the dependency on sensitive features.

For example, given an African face image , and a Caucasian image , we assume that the two different data distributions from these image race groups and can be transferable between each other. To map these two distributions between domain and , we introduce two mapping functions and , respectively from African to Caucasian domains and from Caucasian to African domains using CycleGAN [47]. Within a GAN framework, these two directional transformations need two discriminators and , to distinguish between and , and , respectively. Moreover, as an additional control on adversarial training, a cycle-consistency loss is introduced to ensure that the mapping function can transfer an individual input to the desired output .


For the first part of race transformation, an adversarial loss is used as defined in Equation 1 where and are the African and Caucasian group domains, respectively. While the generator synthesise images using source domain to associate to target domain , discriminator distinguishes between the real image and from the synthesised image, . The same process is applied with generator and discriminator to transform domains from to .

The key premise of CycleGAN [47] is a controlled mechanism of adversarial training which allows us to synthesise more accurate images from the desired images in the domain. To achieve this, cycle consistency loss is introduced as defined in Equation 2 , where is reconstructed from synthesised new image. In this case, generators and are able to reconstruct the original images. The norm in this loss measures the difference between the original image and reconstructed image as follows:


The overall loss function, as defined in Equation 3, consists of two adversarial loss within the cycle-consistency loss where is a term to control the relative importance of the cycle-consistency loss.


Subsequently, overall adversarial training of this objective function aims to solve the following equation:


In the intermediate step and , the generator encodes features of inputs and and then and decodes back to obtain original images again. With reference to this set of transform Equations 1-4, we can transform both, domain into domain and into similarly for other domain pairings.

3.3 Face Recognition

Recent state-of-the-art face recognition algorithms [26, 39, 9, 7] achieve outstanding results for both face verification and identification tasks on public datasets. However, they are not as reliable for real-world racial diversity as their performance is lower for under-represented racial groups [40].

In Section 3.2, we presented our proposed approach to address racial bias within face recognition using an adversarial image-to-image transformation technique. To assess this proposed approach, we first present current face recognition loss functions namely Softmax, CosFace, ArcFace that underpin leading state-of-the-art face recognition algorithms [26, 39, 9], then we utilise each of these three methods in conjunction with our cyclic adversarial domain transfer approach.

The Softmax [26], CosFace [39] and ArcFace [9] methods are based on loss functions that operate on the outputs of the last fully connected layer of the selected backbone Deep Convolutional Neural Network [13] (DCNN). In essence, after feeding an image forward through a DCNN, we obtain the feature space representation of the image. These loss functions enforce different representations of features to predict if they belong to a given subject. First, Softmax loss is formulated as follows,


where is the feature representation of the image in the dataset belonging to -th subject class. The number of samples is labelled with classes. is the j-th column of the weights and is the j-th column of the bias term in the last fully-connected layer. Weights and bias term dimensions are and , respectively.

Softmax loss [26]

is one of the most widely used objective function to learn optimal feature representations from images. It discriminates deep representations from different classes by maximizing the posterior probability of the ground-truth class. Once large-scale datasets have high similarity on intra-class samples and diversity on inter-class samples, Softmax loss entangles features

[14]. To address this problem, CosFace [39] proposes to use both norm and angle of the feature representation to contribute to the posterior probability such that:


where and , , , denote -th feature representation with all other definitions as per previously defined. For CosFace loss, the bias term is removed, and the weights and embeddings are normalized using the

normalization. To cope with incorrect classified samples, a cosine margin

is applied to the classification boundary.

An alternative loss function, ArcFace [9] differs from CosFace [39] based on its distinct margin. ArcFace has more accurate geodesic distance due to it has constant linear angular margin, penalty throughout the interval while CosFace has a nonlinear angular margin. It also normalizes the weights and embeddings and fixes the bias term to zero. In Equation 7, the ArcFace loss function is formulized as follows:


where all definitions are as per Equation 6. Overall the key Softmax, CosFace [39] and ArcFace [9]

differences lie in their use of deep feature representation, weight vectors and approach to their margin penalty. Within the scope of this study, we only use these methods as experimental vehicles to illustrate our per-subject data augmentation methodology to address face recognition race bias within such state-of-the-art face recognition algorithms.

An overview of our approach is shown in Figure 2. Figure 2 (a) describes the real-world dataset imbalanced distribution for different racial groups. As an initial experimental exercise, we, downsample this imbalanced distribution to understand the relationship between bias and data. In Figure 2 (b), we explain the image transformation process for one exemplar Asian subject. We introduce our to three different CycleGAN and obtain three different synthesised images . Subsequently, the training dataset has changed, and then we use our newly augmented dataset for face recognition using algorithms with Softmax, CosFace [39], ArcFace [9] in Figure 2 (c).

4 Experimental Setup

This section provides overview of our experimental evaluation in terms of the face recognition datasets used, the race classification used for racial annotation and the implementation details of our proposed approach.

4.1 Datasets

To validate our approach, we utilise BUPT-Transferface [40], VGGFace2 [7] and RFW [40].


[40] provides 50K African, Asian and Indian face images and over 460K Caucasian face images. We use BUPT-Transferface dataset for two different purposes: (i) race transfer, (ii) race classification.


[7] contains 3.3M+ images for over 9K subjects (8631 subjects training examples, 500 testing examples). We train the face recognition methods which we introduced in Section 3.3 on VGGFace2.

VGGFace2 1200

is a subsampled version of VGGFace2 which is racially balanced and contains 300 subjects per-race. We evaluate our approach on both VGGFace2 1200 and VGGFace2.

Racial Faces in-the-Wild (RFW)

[40] is a face verification test set which provides 6K pairs of images for each race. We compare the verification accuracy of our proposed approach on different races using the same protocol in [18].

4.2 Race Annotations

We obtain racial annotation labels for VGGFace2 [7] dataset using fine-grained classification to solely support our development of a technique to mitigate bias.

The work of [16] proposes attention-guided data augmentation to improve the spatial representation of discriminative image parts using its cropping and dropping mechanism. We adopt this solution for a race classification problem where discriminative image parts are facial attributes of eyes, nose, mouth, and forehead. Via this approach [16], we obtain racial annotations of VGGFace2 [7] and we manually check the least certain subjects according to the majority of image labels for each subject and additionally exclude some subjects who are not in the four-race set . After this semi-automatic process, the subject distribution for training and testing sets is shown in Figure 3 whereby the inherent racial and gender imbalance is clearly illustrated.

Figure 3: VGGFace2 dataset gender and race distribution for train and test.

4.3 Race Transfer

Our proposed image-to-image transformation approach creates a new dataset , to transfer race attributes from one race group to another. To achieve that, we define separate mappings for each pair of the four different race groups. The set of 12 mappings are: {African Asian, African Caucasian, African Indian, Asian African, Asian Caucasian, Asian Indian, Caucasian African, Caucasian Asian, Caucasian Indian, Indian African, Indian Asian, Indian Caucasian}. As our CycleGAN based approach provides two-way transformations between source and target domains, we train six models to find these two directional mappings following the approach outlined in Section 3.2.

For training, we generate 25K image pairs using the BUPT-Transfer [40] dataset. All face images are aligned and have a size of . To avoid gender domain differences, we only match images of the same gender as pairs. Using these six CycleGAN models, we synthesise new images and denote extended dataset as VGGFace2 1200 Races [7] which contains the original VGGFace2 1200 images and synthesised race images. Each image has three different transformed images that belong to other race domains in addition to the original. As a result, we partially absorb the downsampling effect on VGGFace2 1200. Subsequently, we synthesise all non-Caucasians images on original VGGFace2 and call the new dataset VGGFace2 8631 Races, . We do not transform Caucasian images to other racial domains; they are already dominant in the original dataset.

4.4 Face Recognition

We train Softmax [26], CosFace [39] and ArcFace [9] loss over a common convolutional neural network backbone, Resnet [13] using proposed augmented datasets; VGGFace 2 1200, VGGFace 2 8631. We utilise Resnet100 from [9] with structure to get the final 512-D feature space representation after the last convolutional layer. For ArcFace margin on VGGFace2 we set to .

5 Results

Figure 4: A selection of successful (top) and failure (bottom) examples of the CycleGAN racial domain transformation of VGGFace2 dataset. Each column contains an original and sythesised face images of the same subject where the green (top) and red (bottom) borders indicate the original image and the corresponding race labels are laid out on the y-axis.

To evaluate the performance of the proposed approach, we use LFW face verification protocol [18], which measures whether two images belong to the same subject or not.

We assess synthesised image quality by feeding them through a race classifier introduced in Section 4.2. We show examples of the correctly classified images and the misclassified images in Figure 4 (top and bottom parts are separated). Each column of Figure 4 show an image transformation example where the original image is represented with green and red borders, and synthesised images are laid in the corresponding racial domain label in the y-axis. As can be seen in the bottom part of Figure 4, image transformation is prone to fail on poor illumination and pose variations.

Loss Training Dataset LFW RFW
African Asian Caucasian Indian AVG STDV
Softmax VGGFace2 1200 96.13 69.10 73.70 79.25 76.78 74.71 4.37
Softmax VGGFace2 1200 Races 96.27 70.65 75.68 80.27 78.28 76.22 4.16
CosFace VGGFace2 1200 98.16 82.78 82.68 87.53 85.41 84.60 2.33
CosFace VGGFace2 1200 Races 98.65 83.22 83.23 87.95 85.77 85.04 2.28
Arcface VGGFace2 1200 98.16 80.91 81.78 86.86 83.70 83.31 2.64
Arcface VGGFace2 1200 Races 98.63 81.28 82.83 85.95 84.72 83.69 2.06
Table 1: Verification performance (%) of Softmax, CosFace, and ArcFace with ResNet-101 [13] on LFW [18] and RFW [40] when trained on VGGFace2 1200 and proposed VGGFace2 1200 Races datasets.

For face recognition, we first test our performance on balanced datasets VGGFace2 1200 and VGGFace2 1200 Races. We compare our results on RFW [40] using three different loss functions; Softmax, CosFace [39] and ArcFace [9] as shown in Table 1

. Proposed facial image augmentation approach improves performance in all three methods by 0.38-1.51 %. As non-Caucasian results are improved, the standard deviation among groups is decreased. We also share LFW results in Table

1 to show the improvement of our solution on the imbalanced dataset. Second, we use the imbalanced dataset with the ArcFace as shown in Table 2. While LFW verification performance remains the same, RFW African and Asian performances are improved, and the standard deviation declines from 2.91 to 2.45.

Training Dataset LFW RFW
African Asian Caucasion Indian Average STDV
VGGFace2 99.51 89.45 87.61 94.71 91.21 90.75 2.91
VGGFace2 8631 Races 99.51 90.10 87.73 93.72 90.50 90.51 2.45
Table 2: Verification performance (%) of ArcFace using ResNet-101 [13] trained on VGGFace2 [7] and VGGFace2 8631 Races with syntesised images of non-Caucasian subjects on VGGFace2, tested on LFW [18] and RFW [40].

5.1 Ablation Study

Q: This study provides experiments on both balanced and imbalanced training datasets. Why do you not use only the imbalanced datasets? Does balancing datasets help to decrease bias?
A: Imbalanced data may seem to be the main reason for face recognition bias. However, when we train algorithms on completely equally distributed data, the results still appear to exhibit performance bias. To show this, we downsample VGGFace2 and obtain 1000 subjects with 100 images on each subject. We also keep the race and gender groups balanced. As shown in Table 3, there is still about eight per cent gap between African and Caucasian on average. Another study experiments on a large and nearly balanced dataset and again differs on Caucasians and non-Caucasians [41]. Subsequently, we focus on a novel per-subject racial data balancing approach to understanding its impact on the face recognition bias.
Q: How does the training of CycleGAN affect overall accuracy?
A: We assess the quality of our synthesised images by testing them using a race classifier (Section 4.2). We would expect the race classifier to recognise them as the correct transformed racial label. Our overall accuracy is 49% across all transformations, but when we increase this accuracy using more pairs, and longer training, this results in an overall reduction in face recognition performance. The trade-off is complex because after transforming the main racial attributes of the face such as skin colour, eye structure and hair colour, CycleGAN proceeds to translate all facial features including those which implicitly encode unique subject identity. Other notable negatives are variations in pose and illumination on the synthesised images which could alternatively be addressed via [22] in future work.

Method African Asian Caucasian Indian AVG STDV
Softmax 67.95 73.5 77.77 75.78 73.75 4.24
CosFace 77.15 78 82.8 80.42 79.59 2.55
ArcFace 74.75 77.63 83.18 80.97 79.13 3.71

Table 3: RFW dataset verification performance using the LFW protocol [18] for state-of-the-art algorithms trained on per-subject, per-race and per-gender balanced data.

6 Conclusion

Although the usage of face recognition applications is increasing every day, state-of-the-art-methods are still suffering from racial bias in terms of performance. To address this issue, in this study, we explore racial bias in face recognition and present a novel adversarial derived data augmentation methodology. Transferring racial attributes of a human face whilst preserving identity features in the face recognition datasets makes face recognition algorithms more robust and less race-dependant. We demonstrate that our proposed technique improves face recognition accuracy on minority groups by 1% using imbalanced and balanced training datasets. On our manually balanced dataset, we also compare three significant face recognition variants: Softmax [26], CosFace [39] and ArcFace [9] loss functions with a common convolutional neural network backbone ResNet-101 [13]. Although illumination, pose, and light challenge the quality of the image transformation; our technique not only improves the overall face recognition accuracy but also suppresses inter-group performance variation.


  • [1] A. Ali-Gombe and E. Elyan (2019) MFC-gan: class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing. Cited by: §1, §1, §2.
  • [2] B. K. Baloch, S. Kumar, S. Haresh, A. Rehman, and T. Syed (2019) Focused anchors loss: cost-sensitive learning of discriminative features for imbalanced classification. In Asian Conference on Machine Learning, Cited by: §2, §2.
  • [3] S. Barocas, M. Hardt, and A. Narayanan (2017) Fairness in machine learning. Conference on Neural Information Processing Systems. Cited by: §1.
  • [4] S. Bashbaghi, E. Granger, R. Sabourin, and M. Parchami (2019) Deep learning architectures for face recognition in video surveillance. In Deep Learning in Object Detection and Recognition, Cited by: §1.
  • [5] R. Bellamy, K. Dey, et al. (2019) AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development 63 (4/5), pp. 4:1–4:15. Cited by: §2.
  • [6] J. Buolamwini and T. Gebru (2018) Gender shades: intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency, Cited by: §1, §1, §1, §2, §3.1.
  • [7] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman (2018) Vggface2: a dataset for recognising faces across pose and age. In IEEE International Conference on Automatic Face & Gesture Recognition, Cited by: Figure 2, §3.3, §4.1, §4.1, §4.2, §4.2, §4.3, Table 2.
  • [8] J. G. Cavazos, P. J. Phillips, C. D. Castillo, and A. J. O’Toole (2019) Accuracy comparison across face recognition algorithms: where are we on measuring race bias?. arXiv preprint arXiv:1912.07398. Cited by: §2, §2.
  • [9] J. Deng, J. Guo, N. Xue, and S. Zafeiriou (2019) Arcface: additive angular margin loss for deep face recognition. In

    IEEE International Conference on Computer Vision and Pattern Recognition

    Cited by: §2, Figure 2, §3.3, §3.3, §3.3, §3.3, §3.3, §3, §4.4, §5, §6.
  • [10] R. V. Garcia, L. Wandzik, L. Grabner, and J. Krueger (2019) The harms of demographic bias in deep face recognition research. In International Conference on Biometrics, Cited by: §1, §2.
  • [11] S. Gong, X. Liu, and A. K. Jain (2019) DebFace: de-biasing face recognition. arXiv preprint arXiv:1911.08080. Cited by: §1, §1, §2.
  • [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, Cited by: §1.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §3.3, §4.4, Table 1, Table 2, §6.
  • [14] L. He, Z. Wang, Y. Li, and S. Wang (2019) Softmax dissection: towards understanding intra-and inter-clas objective for embedding learning. arXiv preprint arXiv:1908.01281. Cited by: §3.3.
  • [15] L. Hemamou, G. Felhi, V. Vandenbussche, J. Martin, and C. Clavel (2019)

    HireNet: a hierarchical attention model for the automatic analysis of asynchronous video job interviews


    AAAI Conference on Artificial Intelligence

    Cited by: §1.
  • [16] T. Hu and H. Qi (2019) See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891. Cited by: §4.2.
  • [17] C. Huang, Y. Li, C. L. Chen, and X. Tang (2019) Deep imbalanced learning for face recognition and attribute prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1, §2.
  • [18] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller (2008) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Cited by: §4.1, Table 1, Table 2, Table 3, §5.
  • [19] I. Hupont and C. Fernández (2019) DemogPairs: quantifying the impact of demographic imbalance in deep face recognition. In IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Cited by: §1, §2.
  • [20] N. Jain, A. Olmo, S. Sengupta, L. Manikonda, and S. Kambhampati (2020) Imperfect imaganation: implications of gans exacerbating biases on facial data augmentation and snapchat selfie lenses. arXiv preprint arXiv:2001.09528. Cited by: §2.
  • [21] F. Kamiran and T. Calders (2010) Classification with no discrimination by preferential sampling. In Machine Learning Conference Belgium and The Netherlands, Cited by: §2.
  • [22] T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1, §5.1.
  • [23] S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems 29 (8). Cited by: §2, §2.
  • [24] B. Kim, H. Kim, K. Kim, S. Kim, and J. Kim (2019) Learning not to learn: training deep neural networks with biased data. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1, §1, §2, §2, §3.1.
  • [25] M. P. Kim, A. Ghorbani, and J. Zou (2019) Multiaccuracy: black-box post-processing for fairness in classification. In Conference on AI, Ethics, and Society, Cited by: §2.
  • [26] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song (2017) Sphereface: deep hypersphere embedding for face recognition. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: Figure 2, §3.3, §3.3, §3.3, §3.3, §4.4, §6.
  • [27] I. Masi, Y. Wu, T. Hassner, and P. Natarajan (2018) Deep face recognition: a survey. In Conference on Graphics, Patterns and Images, Cited by: §1.
  • [28] D. McDuff, S. Ma, Y. Song, and A. Kapoor (2019) Characterizing bias in classifiers using generative models. In Advances in Neural Information Processing Systems, Cited by: §1, §2, §3.1.
  • [29] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan (2019) A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635. Cited by: §2.
  • [30] M. Merler, N. Ratha, R. S. Feris, and J. R. Smith (2019) Diversity in faces. arXiv preprint arXiv:1901.10436. Cited by: §2.
  • [31] S. Nagpal, M. Singh, R. Singh, M. Vatsa, and N. Ratha (2019) Deep learning for face recognition: pride or prejudiced?. arXiv preprint arXiv:1904.01219. Cited by: §2.
  • [32] D. Pedreshi, S. Ruggieri, and F. Turini (2008) Discrimination-aware data mining. In International Conference on Knowledge Discovery and Data Mining, Cited by: §1.
  • [33] I. D. Raji and J. Buolamwini (2019) Actionable auditing: investigating the impact of publicly naming biased performance results of commercial ai products. In Conference on AI, Ethics, and Society, Cited by: §1.
  • [34] P. Sadhukhan (2019) Learning minority class prior to minority oversampling. In International Joint Conference on Neural Networks, Cited by: §1, §2.
  • [35] N. Srinivas, K. Ricanek, D. Michalski, D. S. Bolme, and M. King (2019) Face recognition algorithm bias: performance differences on images of children and adults. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, Cited by: §2.
  • [36] I. D. Stephen, V. Hiew, V. Coetzee, B. P. Tiddeman, and D. I. Perrett (2017) Facial shape analysis identifies valid cues to aspects of physiological health in caucasian, asian, and african populations. Frontiers in Psychology. Cited by: §2.
  • [37] H. Suresh and J. V. Guttag (2019) A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002. Cited by: §2.
  • [38] M. A. Uddin, J. B. Joolee, and Y. Lee (2020) Depression level prediction using deep spatiotemporal features and multilayer bi-ltsm. IEEE Transactions on Affective Computing. Cited by: §1.
  • [39] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu (2018) Cosface: large margin cosine loss for deep face recognition. In IEEE International Conference on Computer Vision and Pattern Recognition, Cited by: Figure 2, §3.3, §3.3, §3.3, §3.3, §3.3, §3.3, §3, §4.4, §5, §6.
  • [40] M. Wang, W. Deng, J. Hu, X. Tao, and Y. Huang (2019) Racial faces in the wild: reducing racial bias by information maximization adaptation network. In IEEE International Conference on Computer Vision, Cited by: §1, §2, §2, §2, §3.3, §4.1, §4.1, §4.1, §4.3, Table 1, Table 2, §5.
  • [41] M. Wang and W. Deng (2019)

    Mitigate bias in face recognition using skewness-aware reinforcement learning

    arXiv preprint arXiv:1911.10692. Cited by: §1, §2, §5.1.
  • [42] X. Wang and H. Huang (2019) Approaching machine learning fairness through adversarial network. arXiv preprint arXiv:1909.03013. Cited by: §1, §1, §2, §2.
  • [43] Y. Wang, T. Bao, C. Ding, and M. Zhu (2017) Face recognition in real-world surveillance videos with deep learning method. In International Conference on Image, Vision and Computing, Cited by: §2.
  • [44] K. Yang, K. Qinami, L. Fei-Fei, J. Deng, and O. Russakovsky (2020)

    Towards fairer datasets: filtering and balancing the distribution of the people subtree in the imagenet hierarchy

    In Conference on Fairness, Accountability, and Transparency, Cited by: §1.
  • [45] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker (2019)

    Feature transfer learning for face recognition with under-represented data

    In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.
  • [46] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork (2013) Learning fair representations. In International Conference on Machine Learning, Cited by: §2.
  • [47] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017)

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    In IEEE International Conference on Computer Vision, Cited by: Figure 1, 1st item, §1, §1, §2, §3.2, §3.2, §3.2, §3.