Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias

10/27/2021
by   William Thong, et al.
17

This paper strives to address image classifier bias, with a focus on both feature and label embedding spaces. Previous works have shown that spurious correlations from protected attributes, such as age, gender, or skin tone, can cause adverse decisions. To balance potential harms, there is a growing need to identify and mitigate image classifier bias. First, we identify in the feature space a bias direction. We compute class prototypes of each protected attribute value for every class, and reveal an existing subspace that captures the maximum variance of the bias. Second, we mitigate biases by mapping image inputs to label embedding spaces. Each value of the protected attribute has its projection head where classes are embedded through a latent vector representation rather than a common one-hot encoding. Once trained, we further reduce in the feature space the bias effect by removing its direction. Evaluation on biased image datasets, for multi-class, multi-label and binary classifications, shows the effectiveness of tackling both feature and label embedding spaces in improving the fairness of the classifier predictions, while preserving classification performance.

READ FULL TEXT VIEW PDF

page 1

page 15

01/25/2018

Class label autoencoder for zero-shot learning

Existing zero-shot learning (ZSL) methods usually learn a projection fun...
04/16/2020

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

The ability to control for the kinds of information encoded in neural re...
07/20/2022

Discover and Mitigate Unknown Biases with Debiasing Alternate Networks

Deep image classifiers have been found to learn biases from datasets. To...
03/15/2022

Distraction is All You Need for Fairness

With the recent growth in artificial intelligence models and its expandi...
09/22/2021

Contrastive Learning for Fair Representations

Trained classification models can unintentionally lead to biased represe...
06/22/2022

Learning Debiased Classifier with Biased Committee

Neural networks are prone to be biased towards spurious correlations bet...
07/02/2018

Debiasing representations by removing unwanted variation due to protected attributes

We propose a regression-based approach to removing implicit biases in re...

1 Introduction

This paper strives to identify and mitigate biases present in image classifiers, with a focus on their feature and label embedding space. Adverse decisions from image classifiers can create discrimination against members of certain class of protected attribute, such as age, gender, or skin tone. Buolamwini and Gebru [Buolamwini and Gebru(2018)]

importantly show that face recognition systems misclassify subgroups with darker skin tones. This also applies to object recognition, where performance is higher for high-income communities 

[de Vries et al.(2019)de Vries, Misra, Wang, and van der Maaten] mainly located in Western countries [Shankar et al.(2017)Shankar, Halpern, Breck, Atwood, Wilson, and Sculley]. Similarly problematic, current classifiers perpetuate and amplify current discrimination present in society [Caliskan et al.(2017)Caliskan, Bryson, and Narayanan, Garg et al.(2018)Garg, Schiebinger, Jurafsky, and Zou]. For example, Kay et al [Kay et al.(2015)Kay, Matuszek, and Munson] highlight the exaggeration of gender bias in occupations by image search systems. These adverse decisions notably arise because image classifiers are prone to biases present in the dataset [Geirhos et al.(2020)Geirhos, Jacobsen, Michaelis, Zemel, Brendel, Bethge, and Wichmann]. It is therefore essential to identify harmful biases in image representations and assess their effects on the classification predictions, as we do in this paper.

Addressing dataset biases is not enough, and classifier biases should also be addressed. Zhao et al [Zhao et al.(2017)Zhao, Wang, Yatskar, Ordonez, and Chang] importantly show that biases can actually be amplified during the image classifier training. Even when balancing a dataset for the protected attribute gender, image classifiers can still surprisingly amplify biases when making a prediction [Wang et al.(2019b)Wang, Zhao, Yatskar, Chang, and Ordonez]. This outcome emphasizes the importance of considering protected attributes during the training to avoid biased and adverse decisions. A first approach is to perform fairness through blindness, where the objective is to make the feature space blind to the protected attribute [Zhang et al.(2018)Zhang, Lemoine, and Mitchell, Alvi et al.(2018)Alvi, Zisserman, and Nellåker, Hendricks et al.(2018)Hendricks, Burns, Saenko, Darrell, and Rohrbach]. An alternative is to perform fairness through awareness, where the classifier label space is explicitly aware of the protected attribute label [Dwork et al.(2012)Dwork, Hardt, Pitassi, Reingold, and Zemel]. To better understand the effectiveness of these methods, Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky] propose crucial benchmarks in biased image classification. They notably expose the shortcomings of these methods and show that a simple method with separate classifiers is more effective at mitigating biases. Building on this line of work, this paper first identifies a bias direction in the feature space, and secondly address bias mitigation in both label and feature spaces. Another important aspect concerns how to measure the fairness of image classifiers. We borrow from the general fairness literature [Beutel et al.(2017)Beutel, Chen, Zhao, and Chi, Dwork et al.(2012)Dwork, Hardt, Pitassi, Reingold, and Zemel, Hardt et al.(2016)Hardt, Price, and Srebro] to ensure that predictions are similar for all members of a protected attribute, which complements the benchmarks introduced by Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky] on image classification bias.

Contributions.

Our main contribution is to demonstrate the importance of feature and label spaces for addressing image classifier bias. First, we identify a bias direction in the feature space of common classifiers. We aggregate class prototypes to represent every class of each protected attribute value, and show a main direction to explain the maximum variance of the bias. Second, we mitigate biases at both classification and feature levels. We introduce protected classification heads, where each head projects the features to a label embedding space specific to each protected attribute value. This differs from common classification, which usually considers a one-hot encoding for the label space [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky, Saito et al.(2018)Saito, Watanabe, Ushiku, and Harada, Luo et al.(2019)Luo, Zheng, Guan, Yu, and Yang]. For training, we derive a cosine softmax cross-entropy loss for multi-class, multi-label and binary classifications. Once trained, we apply in the feature space a bias removal operation to further reduce the bias effect. Experiments show the benefits on addressing classifier bias in both feature and label embedding spaces to improve fairness scores, while preserving the classification performance. The source code is available at: https://github.com/twuilliam/bias-classifiers.

2 Related Work

Biases in word embeddings.

Assessing the presence of biases in word embeddings, especially the gender bias, has received large attention given their wide range of applications within and beyond natural language processing. The seminal and important work of Bolukbasi 

et al [Bolukbasi et al.(2016)Bolukbasi, Chang, Zou, Saligrama, and Kalai] reveals that the difference between female and male entities in word2vec [Mikolov et al.(2013)Mikolov, Chen, Corrado, and Dean] contains a gender bias direction. This shows that word2vec implicitly captures gender biases, which in return creates sexism in professional activities. Caliskan et al [Caliskan et al.(2017)Caliskan, Bryson, and Narayanan] further reveal that multiple human-like biases are actually present in word embeddings. Even contextualized word embeddings [Peters et al.(2018)Peters, Neumann, Iyyer, Gardner, Clark, Lee, and Zettlemoyer] are affected by a gender bias direction [Zhao et al.(2019)Zhao, Wang, Yatskar, Cotterell, Ordonez, and Chang], which creates harmful risks [Bender et al.(2021)Bender, Gebru, McMillan-Major, and Shmitchell]. To mitigate such gender bias, Bolukbasi et al [Bolukbasi et al.(2016)Bolukbasi, Chang, Zou, Saligrama, and Kalai] propose a post-processing removal operation while Zhao et al [Zhao et al.(2018)Zhao, Zhou, Li, Wang, and Chang] derive regularizers to control the distance between relevant words during training. It is important to note that biases cannot be removed entirely as they can still be recovered to some extent [Gonen and Goldberg(2019)]. As such, methods mainly mitigate biases in models rather than producing debiased models. Inspired by the literature on gender bias identification and mitigation in word embeddings, we pursue an analogous reasoning to show that biases are implicitly encoded in image classification models as well.

Biases in image datasets.

As computer vision research relies heavily on datasets, they constitute a main source of biases. Torralba and Efros 

[Torralba and Efros(2011)] identify that datasets have a strong built-in bias as they only represent a narrow view of the visual world, leading models to rely on spurious correlations and produce detrimental predictions. For fairness and transparency purposes, it becomes necessary to document the dataset creation [Gebru et al.(2018)Gebru, Morgenstern, Vecchione, Vaughan, Wallach, Daumé III, and Crawford, Hutchinson et al.(2021)Hutchinson, Smart, Hanna, Denton, Greer, Kjartansson, Barnes, and Mitchell], as well as detecting the presence of potential biases and harms due to an unfair and unequal label sampling [Shankar et al.(2017)Shankar, Halpern, Breck, Atwood, Wilson, and Sculley, Yang et al.(2020)Yang, Qinami, Fei-Fei, Deng, and Russakovsky, Birhane and Prabhu(2021), Dixon et al.(2018)Dixon, Li, Sorensen, Thain, and Vasserman]. Towards this end, Bellamy et al [Bellamy et al.(2018)Bellamy, Dey, Hind, Hoffman, Houde, Kannan, Lohia, Martino, Mehta, Mojsilovic, Nagar, Natesan Ramamurthy, Richards, Saha, Sattigeri, Singh, Varshney, and Zhang] and Wang et al [Wang et al.(2020a)Wang, Narayanan, and Russakovsky] propose metrics to measure biases, and actionable insights to mitigate them in a dataset. Even though addressing biases when collecting a dataset is highly recommended, models can still produce unfair decisions [Wang et al.(2019b)Wang, Zhao, Yatskar, Chang, and Ordonez]. In this paper, we focus on addressing image classifier bias.

Biases in image classifiers.

Searching for a representative subset of image examples provides visual explanations of biases [Kim et al.(2016)Kim, Koyejo, and Khanna, Stock and Cisse(2018)]. In this paper, we rather identify that such bias exists in the feature space in image classifiers. To mitigate image classification bias, training with adversarial learning [Goodfellow et al.(2014)Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, and Bengio] makes the classifier blind to the protected attribute. Reducing the gender bias can be achieved by forcing a model to avoid looking at people to produce a prediction [Hendricks et al.(2018)Hendricks, Burns, Saenko, Darrell, and Rohrbach, Wang et al.(2019b)Wang, Zhao, Yatskar, Chang, and Ordonez]. Blindness can also be achieved in the feature space by removing the variation of the protected attribute [Alvi et al.(2018)Alvi, Zisserman, and Nellåker, Zhang et al.(2018)Zhang, Lemoine, and Mitchell]. Though, Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky] illustrate that adversarial approaches tend to be detrimental as they decrease the performance by making image classifiers less discriminative. At the same time, non-adversarial approaches tend to amplify biases less, while performing well on image classification. Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky] notably show that encoding the protected attribute into separate heads better mitigates biases. We build on this literature and propose to mitigate biases at both classification and feature levels.

Biases benchmarking.

No consensus exists (yet) in mitigating image classifier bias, which makes apple-to-apple comparisons complicated: (a) benchmarks become no longer valid because datasets are taken down for ethical reasons [Peng et al.(2021)Peng, Mathur, and Narayanan] (e.g., Racial faces in-the-wild [Wang et al.(2019a)Wang, Deng, Hu, Tao, and Huang] derives from the problematic MS-Celeb-1M [Guo et al.(2016)Guo, Zhang, Hu, He, and Gao], and Diversity in Faces [Merler et al.(2019)Merler, Ratha, Feris, and Smith] has received complaints); (b) datasets are introduced without benchmarks of debiasing methods (e.g., FairFace [Karkkainen and Joo(2021)] mainly evaluates commercial facial classification systems); (c) related works come with differing evaluation settings (e.g., Wang et al [Wang et al.(2019b)Wang, Zhao, Yatskar, Chang, and Ordonez] train MLP probes to measure model leakage). While addressing algorithm bias in face verification [Gong et al.(2020)Gong, Liu, and Jain, Singh et al.(2020)Singh, Agarwal, Singh, Nagpal, and Vatsa, Yin et al.(2019)Yin, Yu, Sohn, Liu, and Chandraker] is crucial, we focus on image classification [Wang et al.(2019b)Wang, Zhao, Yatskar, Chang, and Ordonez, Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky, Kim et al.(2019)Kim, Kim, Kim, Kim, and Kim, Hwang et al.(2020)Hwang, Park, Lee, Jeon, Kim, and Byun]. Therefore, we adopt in this paper the benchmarks introduced by Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky] and Kim et al [Kim et al.(2019)Kim, Kim, Kim, Kim, and Kim] in multi-class, multi-label and binary classifications for their comprehensiveness and reproducibility.

3 Identifying a Bias Direction

Problem formulation.

We consider the task of image classification where every image is assigned a label . For every image, there also exists a protected attribute value , on which the classifier should not base its decision. In other words, classifiers should not discriminate against specific members of a protected attribute. In this paper, we consider discrete variables for protected attribute values, and limit the problem to binary values with . For example, we only consider the values female and male to describe the protected attribute gender. It is important to note that this formulation is a simplification of the real world where protected attributes go beyond binary values, and are non-discrete.

Image classifiers are typically composed of a base encoder and a projection head. First, a base encoder extracts the feature representations of images . In our case, this corresponds to a convolutional network and results in . Second, a projection head maps the features

to a discriminative space where a class is assigned. In our case, this corresponds to a linear projection, or a multilayer perceptron, and results in

with . For example, in a one-hot encoding, equals the number of classes.

During training, we are given access to the protected attribute labels and can incorporate it in model formulations. We denote the triplet as the -th sample in the training set. During the evaluation, models only have access to the images. In this section, we show that common image classifiers – that do not leverage protected attribute labels during training – still implicitly encode their information in the feature space.

[width=0.9]figs/intuition4.pdf 0 ➝ color1 ➝ gray
Figure 1: 2D toy visualization of the feature space, where class prototypes represent three categories with a color bias (✸ vs 

). A bias vector

is computed for every class.
(a) . (b) Random .
Figure 2: Bias direction in the feature space. (a) The PCA of shows the maximum variance as the bias direction. (b) On a random

, the direction disappears and the explained variance is no longer skewed.

Protected class prototypes.

Once a model has been trained, we extract the features from the training set. We then aggregate prototypes for every class and specific to each protected attribute value , coined as protected class prototypes. For example in Figure 2, the class has two prototypes in the feature space, one for images and one for . For any class with any protected attribute value , we compute the protected class prototypes as their average representation in the feature space from the training set:

(1)

where is the number of training images of class with protected attribute , and is the indicator function. Once all protected class prototypes are computed, we extract a subspace that captures the variance of the bias related to the protected attribute.

Bias direction.

To identify a bias direction, we experiment with a standard convolutional network trained with a softmax cross-entropy loss on CIFAR-10S 

[Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky]. This dataset provides a simple testbed to measure biases in images, as certain classes are skewed towards gray images, while others are skewed towards color images. Once trained, we aggregate the difference between class prototypes of each protected attribute value for every class:

(2)

Note that for multi-label classification, we consider all binary labels to define . Figure 2(a)

shows the principal component analysis (PCA) of

. When computing the ratio of explained variance of every principal component (PC), a main direction of variance appears. The first PC is more important than the others, which yields a high skewness. Figure 2(b) depicts the same analysis on a random , where no main direction appears. Hence, there exists a subspace in the feature space where the bias information is maximized.

4 Mitigating Biases

[width=0.37trim=-14 0 -14 0,clip]figs/training.pdf

(a) Protected embeddings for classification.

[width=0.42]figs/inference.pdf

(b) Bias removal in the feature space.
Figure 3: Mitigating biases in classification predictions. (a) For classification, we mitigate biases with protected label embeddings where each protected attribute value has its own space. (b) In the feature space, we include a removal operation of the bias direction once the model has been trained, where is computed from the training set.

Figure 3 illustrates our approach to mitigate biases in class predictions at both classification and feature levels. For the classification level, we create two protected label embedding spaces, one for each value of the binary protected attribute. For the feature level, we propose a bias removal operation once the model has been trained. The proposed method works for multi-class, multi-label and binary settings.

Protected label embeddings.

We project features into embedding spaces, one for each protected attribute value. This results in the embedding representation , where classification occurs. During training, each projection head only sees samples from its assigned attribute value, which creates a protected embedding. By only seeing samples of one protected value, class boundaries are better separated [Saito et al.(2018)Saito, Watanabe, Ushiku, and Harada].

We further push these properties by relying on a cosine softmax cross-entropy loss for classification. constitutes a discriminative embedding representation with semantic information about classes. This differs from related approaches in domain adaptation [Saito et al.(2018)Saito, Watanabe, Ushiku, and Harada, Luo et al.(2019)Luo, Zheng, Guan, Yu, and Yang] or bias mitigation [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky], which also show the benefits of separate projection heads with a standard softmax but with a one-hot encoding label space. Below we derive a cosine softmax with protected embeddings for both multi-class, multi-label and binary classifications.

Multi-class classification

assigns a label to an image . We introduce a protected weight matrix , where is the size of the embedding space and is the protected attribute value. Every row acts as a latent real-valued semantic representation for every class of each protected attribute

. The objective is then to maximize the cosine similarity, denoted as sim, between an embedding representation

and its corresponding weight representation. This results in the probabilistic model:

(3)

where is a temperature scaling hyper-parameter. For training, we minimize the cross-entropy loss over the training set of size : . During inference, the attribute value label is not present. Thus, we perform an ensemble prediction over both heads to predict .

Multi-label classification

assigns multiple binary labels to an image . This typically occurs when we want to predict the presence of multiple binary attributes in an image. We denote as the label of attribute . Similar to multi-class classification, we introduce a protected weight matrix where the two rows correspond to the absence and presence of attribute for protected attribute . The resulting probabilistic model is:

(4)

which corresponds to a classifier for two classes. Compared with a binary classifier with a sigmoid function, the softmax function offers more flexibility for the model to represent the negatives. We minimize the cross-entropy loss over all

attributes of the training set of size :

. During inference, we also perform an ensemble prediction to compute the probability score for the presence of every attribute

. Binary classification is a special case where .

Bias removal in the feature space.

Once trained, we perform the same analysis as in Section 3 where we collect protected class prototypes in the feature space from the training set and also apply a principal component analysis on their differences . We refer to the direction of the first principal component of as . Following Bolukbasi et al [Bolukbasi et al.(2016)Bolukbasi, Chang, Zou, Saligrama, and Kalai], we first project features on the bias direction to obtain . Then, we neutralize the bias effect by removing from the features , resulting in the mitigated features . Mathematically, this bias removal operation corresponds to: Once is computed, we can further feed it to each head to get the mitigated protected embeddings .

Relation with Domain Independent [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky].

Our proposed method builds on the observation from Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky] that separate classification heads improve the fairness of the predictions. We differ by demonstrating how feature and label spaces also matter for addressing biases. We find the feature space implicitly encodes a bias direction (Section 3) and we derive a bias removal operation to reduce its influence. As distances matter in the feature space, this motivates us to switch from a one-hot encoding to a real-valued vector representation for the label space, where classification now occurs through a cosine embedding softmax.

5 Experiments

5.1 Fairness Metrics

Bias amplification

measures whether spurious correlations present in the dataset have been amplified by the model during training [Zhao et al.(2017)Zhao, Wang, Yatskar, Ordonez, and Chang]. Following Zhao et al [Zhao et al.(2017)Zhao, Wang, Yatskar, Ordonez, and Chang], the bias amplification score corresponds to: , where is the number of images positive for class with a protected attribute predicted by the model, and is the ratio of training images of class with a protected attribute . Intuitively, the score should be as low as possible: a positive value indicates a bias amplification while a negative value indicates a bias reduction. When training and testing sets are not i.i.d., we follow Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky] and compute: .

Demographic parity

assesses the independence between a prediction and a protected attribute such that  [Hardt et al.(2016)Hardt, Price, and Srebro, Dwork et al.(2012)Dwork, Hardt, Pitassi, Reingold, and Zemel]. Following Beutel et al [Beutel et al.(2017)Beutel, Chen, Zhao, and Chi], a statistical parity difference score is derived: , where and are the number of true positives and false positives of class with protected attribute , and is the number of images with protected attribute in the evaluation set. When the score tends to zero, the model makes the same rate of predictions for class regardless of the protected attribute value.

Equality of opportunity

assesses the conditional independence on a particular class between a prediction and a protected attribute such that  [Hardt et al.(2016)Hardt, Price, and Srebro]. Following Beutel et al [Beutel et al.(2017)Beutel, Chen, Zhao, and Chi], a difference of equality of opportunity score is derived: , where is the number of false negatives of class with protected attribute . When the score tends to zero, the model classifies images as class correctly regardless of the protected attribute value.

Equalized odds

assesses the conditional independence on any class between a prediction and a protected attribute such that  [Hardt et al.(2016)Hardt, Price, and Srebro]. Following Bellamy [Bellamy et al.(2018)Bellamy, Dey, Hind, Hoffman, Houde, Kannan, Lohia, Martino, Mehta, Mojsilovic, Nagar, Natesan Ramamurthy, Richards, Saha, Sattigeri, Singh, Varshney, and Zhang]

, a difference of equalized odds score is derived:

, where is the false positive rate of class with protected attribute and is the true positive rate. When the score tends to zero, the model exhibits similar true positive and false positive rates for both protected attribute values.

5.2 Multi-class Classification

Setup.

We evaluate multi-class classification on the CIFAR-10S dataset [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky], which is a biased version of the original CIFAR-10 dataset [Krizhevsky and Hinton(2009)]. A color bias is introduced in the training set, where 5 classes contain 95% gray images and 5% color images, and conversely for the 5 other classes. Visual examples for every class in their dominant color bias are present in the appendix. This creates simple spurious correlations that still affect common classifiers. Two versions of the testing set are considered: one with only gray images and another one with only color images. Although this breaks the i.i.d. assumption between training and testing sets, it allows the assessment of the color bias in a controlled manner. We report the per-class accuracy over 5 runs. We rely on ResNet18 [He et al.(2016)He, Zhang, Ren, and Sun] as the encoding function and set each projection function as a fully-connected layer of size

followed by a linear activation. Training is done from scratch with stochastic gradient descent with momentum 

[Sutskever et al.(2013)Sutskever, Martens, Dahl, and Hinton]

for 200 epochs, and the following hyper-parameters: learning rate of 0.1 with a momentum of 0.9, batch size of 128, weight decay of 5e-4, and temperature of 0.1. The learning rate is reduced by a factor 10 every 50 epochs. Note that this setup is identical for all models we compare with, as benchmarked by Wang 

et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky].

Results.

Model Loss Acc. (%,) Bias () Parity (%,) Opp. (%,) Odds (%,)
Baseline N-way softmax 88.50.3 0.0740.003 2.900.11 13.070.37 7.190.21
Oversampling N-way softmax 89.10.4 0.0660.002 2.770.67 12.580.19 6.910.11
Adversarial w/ confusion [Alvi et al.(2018)Alvi, Zisserman, and Nellåker, Tzeng et al.(2015)Tzeng, Hoffman, Darrell, and Saenko] 83.81.1 0.1010.007 4.140.28 16.711.37 9.280.73
w/ rev. proj. [Ganin et al.(2016)Ganin, Ustinova, Ajakan, Germain, Larochelle, Laviolette, Marchand, and Lempitsky] 84.11.0 0.0940.011 3.600.46 14.131.43 7.890.81
Domain Discriminative joint ND-way softmax 90.30.5 0.0400.002 1.650.06 7.270.32 4.020.17
Domain Independent N-way softmaxD 92.00.1 0.0040.001 0.200.04 1.070.22 0.590.12
This paper N-way cos softmaxD 91.50.2 0.0040.000 0.150.01 0.830.12 0.460.07
Table 1: Multi-class classification comparison on classes of CIFAR-10S. Despite a small loss in the accuracy score, our proposed approach with a cosine softmax, rather than a common softmax as in Domain independent, improves the fairness of the model in multi-class classification.

Table 1 compares our method with four other approaches. Baseline is a standard model trained with an N-way softmax while Oversampling balances out the training by sampling more often underrepresented values of the protected attribute. Adversarial blinds the feature space to the protected attribute. This is achieved either with a uniform confusion loss [Alvi et al.(2018)Alvi, Zisserman, and Nellåker, Tzeng et al.(2015)Tzeng, Hoffman, Darrell, and Saenko] or a gradient reversal layer [Ganin et al.(2016)Ganin, Ustinova, Ajakan, Germain, Larochelle, Laviolette, Marchand, and Lempitsky]. Domain discriminative makes the classification aware of the protected attribute label by assigning a class for every category and protected attribute pair [Dwork et al.(2012)Dwork, Hardt, Pitassi, Reingold, and Zemel]. Domain independent creates two classification heads, one head for each value of the protected attribute [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky]. Reported accuracy and bias amplification scores correspond to Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky], while we reproduce their experiments from the source code for the demographic parity, equality of opportunity, and equalized odds scores.

Our proposed approach improves upon the other alternatives in the fairness scores. Only in the accuracy metric our model yields slightly lower results compared with Domain independent. This shows that there might exist a trade-off between the downstream task and the fairness of the classifier, as improving both remains challenging. It is interesting that Adversarial produces worse results than simple methods such as Baseline or Oversampling. As Adversarial blurs the distinction between both protected attribute values, it also alters the class boundaries, which makes the model less discriminative. Domain discriminative achieves a lower performance than our model and Domain independent. This highlights the importance of separating the classification heads for each protected attribute value. Overall, our proposed approach with a cosine softmax, rather than a common softmax as in Domain independent, reduces the bias direction in the feature space (see appendix) and improves the fairness in multi-class classification.

5.3 Multi-label Classification

Setup.

We evaluate multi-label classification on the Align and Cropped split of the CelebA dataset [Liu et al.(2015)Liu, Luo, Wang, and Tang], which contains 202,599 face images labeled with 40 binary attributes. Following Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky], we consider the gender as the protected attribute and train models to predict the other 39 attributes. Visual examples of attributes with a high gender skewness are presented in the appendix. During the testing phase, only 34 attributes are considered as the other 5 do not contain both genders. We report the weighted mean average (mAP) precision across the selected attributes. Every positive man image is weighted by while every positive woman image by , where and are the man and woman image counts in the test set. This weighting ensures a balanced representation of both genders in the evaluation of every attribute.

We rely on ResNet50 [He et al.(2016)He, Zhang, Ren, and Sun]

pre-trained on ImageNet 

[Russakovsky et al.(2015)Russakovsky, Deng, Su, Krause, Satheesh, Ma, Huang, Karpathy, Khosla, Bernstein, Berg, and Fei-Fei] as the encoding function . We remove the final classification layer and replace it with two fully-connected layers (one for each protected attribute ) of size followed by a linear activation as the projection function . Training is done with stochastic gradient descent with momentum [Sutskever et al.(2013)Sutskever, Martens, Dahl, and Hinton], and the following hyper-parameters: learning rate of 0.1 with a momentum of 0.9, batch size of 32, and temperature of 0.05. The best model is selected according to the weighted mAP score on the validation set. Compared with the benchmarks introduced by Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky], our model training only differs by the optimizer, as we notice some overfitting issues when using Adam [Kingma and Ba(2015)]. The backbone and the rest of the hyper-parameters are similar.

Loss mAP Bias Parity Opp. Odds N sigmoids D 75.4 -0.039 17.74 14.87 9.19 N cos sigmoids D 75.5 0.001 11.63 10.29 5.79 + bias removal 74.7 -0.020 7.43 7.00 4.00 N cos softmax D 76.3 -0.006 11.97 10.18 6.06 + bias removal 75.3 -0.041 6.71 6.73 4.10
Table 2: Label space comparison on CelebA. An embedding learned with a cosine similarity improves the fairness upon common sigmoids. A softmax with bias removal in the feature space further improves fairness.
Embedding Cos softmax mAP Bias Parity Opp. Odds Single N 74.5 -0.039 10.65 14.02 7.77 Single N D 67.7 -0.070 19.26 21.02 13.54 Protected N D 75.3 -0.041 6.71 6.73 4.10
Table 3: Single vsprotected embedding comparison on CelebA. Separating the gender information into protected heads results in an increased classification and fairness performance over a single head.
Model Loss mAP (%,) Bias () Parity (%,) Opp. (%,) Odds (%,)
Baseline N sigmoids 74.7 0.010 23.32 24.34 14.28
Adversarial w/ confusion [Alvi et al.(2018)Alvi, Zisserman, and Nellåker, Tzeng et al.(2015)Tzeng, Hoffman, Darrell, and Saenko] 71.9 0.019 23.73 28.66 16.69
Domain Discriminative ND sigmoids 73.8 0.007 22.34 25.35 14.69
Domain Independent N sigmoids D 75.4 -0.039 17.74 14.87 9.19
This paper N cos softmax D 75.3 -0.041 6.71 6.73 4.10
Table 4: Multi-label classification comparison of attributes in CelebA. Despite a small loss in the mAP score, our proposed embedding – learned with a cosine softmax rather than a common softmax with one-hot encoding as in Domain independent – improves the fairness of the model in multi-label classification.

Label space.

Table 3 compares the different formulations of the label embedding space. Relying on a real-valued embedding space learned with a cosine similarity function improves the fairness of the predictions compared with the common one-hot representation. Labels now correspond to a real-valued vector instead of a binary value, which enables a distributed class representation. Switching to a softmax function instead of a sigmoid provides a weight representation for negatives, which in return helps the classification performance. The benefit of negative representations is further highlighted when applying the bias removal operation in the feature space, even though a small drop in the classification score occurs. Overall, learning an embedding with a softmax cross-entropy, plus the bias removal, preserves the performance of the downstream task while improving the fairness of the predictions.

Single vsprotected embeddings.

Table 3 assesses the importance of having protected embeddings, with one projection function for each value

of the protected attribute gender. We evaluate the single head setting with and without the protected attribute label in the loss function. When the protected attribute information is available, we basically have two cosine softmax losses, one for each value. Mixing the two losses in one single head is detrimental to the performance as the model gets confused on where to project the inputs in the embedding space. Protected embeddings better separate the gender information for the classification of every attribute as illustrated by the improved performance, and fairness scores overall.

Results.

Table 4 compares our model with four other approaches, similarly to the comparison in Table 1. Reported mAP and bias amplification scores correspond to Wang et al [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky], while we reproduce their experiments to measure demographic parity, equality of opportunity, and equalized odds scores. Our proposed approach yields the fairer scores across all evaluated models. And similar to multi-class classification, we also notice a small drop in the downstream task when measuring the mAP. The Adversarial produces again the worst results across all metrics. This indicates that current methods applying an adversarial training remove more information than the bias, which is detrimental for both the downstream task and the fairness of the model. Domain discriminative and Baseline result in a similar performance. Interestingly, a trade-off between the mAP and fairness scores is also present in Domain independent. Our proposed approach improves over Domain independent in the fairness scores by a large margin. Mitigating the bias in both feature and label embedding spaces is then preferred over methods that only address one of the two.

Method Trained on EB1 Trained on EB2
EB2 Test EB1 Test
Baseline 59.86 84.42 57.84 69.75
Alvi et al [Alvi et al.(2018)Alvi, Zisserman, and Nellåker] 63.74 85.56 57.33 69.90
Kim et al [Kim et al.(2019)Kim, Kim, Kim, Kim, and Kim] 68.00 86.66 64.18 74.50
This paper 70.85 88.73 80.59 83.65
(a) Gender prediction (age protected)
Trained on EB1 Trained on EB2
EB2 Test EB1 Test
54.30 77.17 48.91 61.97
66.80 75.13 64.16 62.40
54.27 77.43 62.18 63.04
35.93 77.67 65.90 73.08
(b) Age prediction (gender protected)
Table 5: Binary classification comparison

on IMDB face dataset. Our formulation of the label embedding space improves the binary classification accuracy (%) with an extreme bias over methods that impose an invariance to the protected attribute in the feature space.

Binary classification.

We evaluate binary classification on the cropped split of the IMDB face dataset [Rothe et al.(2018)Rothe, Timofte, and Gool]. Following Kim et al [Kim et al.(2019)Kim, Kim, Kim, Kim, and Kim], we create three sets with an extreme bias: EB1 comprises women 29 years old (yo) and men 40 yo; EB2 has women 40 yo and men 29 yo; and Test has women and men 29 yo and 40 yo. They contain 36,004, 16,800 and 13,129 face images of celebrities. Similar to Kim et al [Kim et al.(2019)Kim, Kim, Kim, Kim, and Kim], we learn to predict the gender with age as a protected attribute (and conversely), and rely on ResNet18 [He et al.(2016)He, Zhang, Ren, and Sun]

pre-trained on ImageNet 

[Russakovsky et al.(2015)Russakovsky, Deng, Su, Krause, Satheesh, Ma, Huang, Karpathy, Khosla, Bernstein, Berg, and Fei-Fei] as the encoding function . We add a fully-connected layer of size with linear activation for each projection function . Training is done with stochastic gradient descent with momentum [Sutskever et al.(2013)Sutskever, Martens, Dahl, and Hinton], and a learning rate of 0.1 with momentum of 0.9 and an exponential decay of 0.999, batch size of 128, and temperature of 0.1. Given the extreme bias, we update both protected heads instead of only one as done previously.

Table 5 compares our model with three other approaches. Baseline is also a standard model trained with binary cross-entropy. Both Alvi et al [Alvi et al.(2018)Alvi, Zisserman, and Nellåker] and Kim et al [Kim et al.(2019)Kim, Kim, Kim, Kim, and Kim] mitigate the extreme bias by making the feature space invariant to the protected attribute. Kim et al [Kim et al.(2019)Kim, Kim, Kim, Kim, and Kim] rely on an adversarial formulation [Ganin et al.(2016)Ganin, Ustinova, Ajakan, Germain, Larochelle, Laviolette, Marchand, and Lempitsky, Chen et al.(2016)Chen, Duan, Houthooft, Schulman, Sutskever, and Abbeel], improving over Alvi et al [Alvi et al.(2018)Alvi, Zisserman, and Nellåker]. Given the binary classification setting, we did not apply a bias removal operation, as a PCA on two samples is not pertinent. Still, our formulation of the label space improves the performance in both the gender and age settings. Only when predicting age and training on EB1, our model struggles a bit as it tends to overfit quickly. This binary classification comparison further confirms that simpler alternatives to adversarial losses can better mitigate biases in image classifiers.

6 Conclusion

Reducing the effect of adverse decisions involves the identification and mitigation of biases within model representations. In this paper, we focus on biases coming from binary protected attributes. First, we identify a direction in the feature space of common image classifiers, where the first principal component of the difference of protected class prototypes captures bias variation. Second, building on this observation, we mitigate bias with protected projection heads that learn a label embedding space for each protected attribute value. This formulation trained with a cosine softmax cross-entropy loss improves the fairness in multi-class, multi-label and binary classifications compared with a common one-hot encoding. Removing the bias direction in the feature space reduces even further the bias effect on the classifier predictions. Overall, addressing image classifier bias on both feature and label spaces improves the fairness of predictions, while preserving the classification performance.

References

  • [Alvi et al.(2018)Alvi, Zisserman, and Nellåker] Mohsan Alvi, Andrew Zisserman, and Christoffer Nellåker.

    Turning a blind eye: Explicit removal of biases and variation from deep neural network embeddings.

    In ECCVw, 2018.
  • [Bellamy et al.(2018)Bellamy, Dey, Hind, Hoffman, Houde, Kannan, Lohia, Martino, Mehta, Mojsilovic, Nagar, Natesan Ramamurthy, Richards, Saha, Sattigeri, Singh, Varshney, and Zhang] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. In arXiv:1810.01943, 2018.
  • [Bender et al.(2021)Bender, Gebru, McMillan-Major, and Shmitchell] Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell.

    On the dangers of stochastic parrots: Can language models be too big?

    In FAccT, 2021.
  • [Beutel et al.(2017)Beutel, Chen, Zhao, and Chi] Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. Data decisions and theoretical implications when adversarially learning fair representations. In FAT/ML, 2017.
  • [Birhane and Prabhu(2021)] Abeba Birhane and Vinay Uday Prabhu. Large image datasets: A pyrrhic win for computer vision? In WACV, 2021.
  • [Bolukbasi et al.(2016)Bolukbasi, Chang, Zou, Saligrama, and Kalai] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In NeurIPS, 2016.
  • [Buolamwini and Gebru(2018)] Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In FAccT, 2018.
  • [Caliskan et al.(2017)Caliskan, Bryson, and Narayanan] Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 2017.
  • [Chen et al.(2016)Chen, Duan, Houthooft, Schulman, Sutskever, and Abbeel] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NeurIPS, 2016.
  • [de Vries et al.(2019)de Vries, Misra, Wang, and van der Maaten] Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. Does object recognition work for everyone? In CVPRw, 2019.
  • [Dixon et al.(2018)Dixon, Li, Sorensen, Thain, and Vasserman] Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. Measuring and mitigating unintended bias in text classification. In AIES, 2018.
  • [Dwork et al.(2012)Dwork, Hardt, Pitassi, Reingold, and Zemel] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In ITCSC, 2012.
  • [Ganin et al.(2016)Ganin, Ustinova, Ajakan, Germain, Larochelle, Laviolette, Marchand, and Lempitsky] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. JMLR, 17(1), 2016.
  • [Garg et al.(2018)Garg, Schiebinger, Jurafsky, and Zou] Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. Word embeddings quantify 100 years of gender and ethnic stereotypes. PNAS, 115(16), 2018.
  • [Gebru et al.(2018)Gebru, Morgenstern, Vecchione, Vaughan, Wallach, Daumé III, and Crawford] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. Datasheets for datasets. In FAT/ML, 2018.
  • [Geirhos et al.(2020)Geirhos, Jacobsen, Michaelis, Zemel, Brendel, Bethge, and Wichmann] Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A Wichmann. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 2020.
  • [Gonen and Goldberg(2019)] Hila Gonen and Yoav Goldberg. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In NAACL, 2019.
  • [Gong et al.(2020)Gong, Liu, and Jain] Sixue Gong, Xiaoming Liu, and Anil K Jain.

    Jointly de-biasing face recognition and demographic attribute estimation.

    In ECCV, 2020.
  • [Goodfellow et al.(2014)Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, and Bengio] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NeurIPS, 2014.
  • [Guo et al.(2016)Guo, Zhang, Hu, He, and Gao] Yandong Guo, Lei Zhang, Yuxiao Hu, X. He, and Jianfeng Gao. MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In ECCV, 2016.
  • [Hardt et al.(2016)Hardt, Price, and Srebro] Moritz Hardt, Eric Price, and Nati Srebro.

    Equality of opportunity in supervised learning.

    In NeurIPS, 2016.
  • [He et al.(2016)He, Zhang, Ren, and Sun] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
  • [Hendricks et al.(2018)Hendricks, Burns, Saenko, Darrell, and Rohrbach] Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. Women also snowboard: Overcoming bias in captioning models. In ECCV, 2018.
  • [Hutchinson et al.(2021)Hutchinson, Smart, Hanna, Denton, Greer, Kjartansson, Barnes, and Mitchell] Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell.

    Towards accountability for machine learning datasets: Practices from software engineering and infrastructure.

    In FAccT, 2021.
  • [Hwang et al.(2020)Hwang, Park, Lee, Jeon, Kim, and Byun] Sunhee Hwang, Sungho Park, Pilhyeon Lee, Seogkyu Jeon, Dohyung Kim, and Hyeran Byun. Exploiting transferable knowledge for fairness-aware image classification. In ACCV, 2020.
  • [Karkkainen and Joo(2021)] Kimmo Karkkainen and Jungseock Joo. Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In WACV, 2021.
  • [Kay et al.(2015)Kay, Matuszek, and Munson] Matthew Kay, Cynthia Matuszek, and Sean A Munson. Unequal representation and gender stereotypes in image search results for occupations. In CHI, 2015.
  • [Kim et al.(2016)Kim, Koyejo, and Khanna] Been Kim, Oluwasanmi Koyejo, and Rajiv Khanna. Examples are not enough, learn to criticize! Criticism for interpretability. In NeurIPS, 2016.
  • [Kim et al.(2019)Kim, Kim, Kim, Kim, and Kim] Byungju Kim, Hyunwoo Kim, Kyungsu Kim, Sungjin Kim, and Junmo Kim. Learning not to learn: Training deep neural networks with biased data. In CVPR, 2019.
  • [Kingma and Ba(2015)] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
  • [Krizhevsky and Hinton(2009)] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.
  • [Liu et al.(2015)Liu, Luo, Wang, and Tang] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In ICCV, 2015.
  • [Luo et al.(2019)Luo, Zheng, Guan, Yu, and Yang] Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In CVPR, 2019.
  • [Merler et al.(2019)Merler, Ratha, Feris, and Smith] Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. Diversity in faces. arXiv:1901.10436, 2019.
  • [Mikolov et al.(2013)Mikolov, Chen, Corrado, and Dean] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In ICLR, 2013.
  • [Peng et al.(2021)Peng, Mathur, and Narayanan] Kenny Peng, Arunesh Mathur, and Arvind Narayanan. Mitigating dataset harms requires stewardship: Lessons from 1000 papers. arXiv:2108.02922, 2021.
  • [Peters et al.(2018)Peters, Neumann, Iyyer, Gardner, Clark, Lee, and Zettlemoyer] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In NAACL, 2018.
  • [Rothe et al.(2018)Rothe, Timofte, and Gool] Rasmus Rothe, Radu Timofte, and Luc Van Gool. Deep expectation of real and apparent age from a single image without facial landmarks. IJCV, 126(2-4):144–157, 2018.
  • [Russakovsky et al.(2015)Russakovsky, Deng, Su, Krause, Satheesh, Ma, Huang, Karpathy, Khosla, Bernstein, Berg, and Fei-Fei] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 115(3), 2015.
  • [Saito et al.(2018)Saito, Watanabe, Ushiku, and Harada] Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tatsuya Harada. Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR, 2018.
  • [Shankar et al.(2017)Shankar, Halpern, Breck, Atwood, Wilson, and Sculley] Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. No classification without representation: Assessing geodiversity issues in open data sets for the developing world. In NeurIPSw, 2017.
  • [Singh et al.(2020)Singh, Agarwal, Singh, Nagpal, and Vatsa] Richa Singh, Akshay Agarwal, Maneet Singh, Shruti Nagpal, and Mayank Vatsa. On the robustness of face recognition algorithms against attacks and bias. In AAAI, 2020.
  • [Stock and Cisse(2018)] Pierre Stock and Moustapha Cisse. Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases. In ECCV, 2018.
  • [Sutskever et al.(2013)Sutskever, Martens, Dahl, and Hinton] Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In ICML, 2013.
  • [Torralba and Efros(2011)] Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. In CVPR, 2011.
  • [Tzeng et al.(2015)Tzeng, Hoffman, Darrell, and Saenko] Eric Tzeng, Judy Hoffman, Trevor Darrell, and Kate Saenko. Simultaneous deep transfer across domains and tasks. In ICCV, 2015.
  • [Wang et al.(2020a)Wang, Narayanan, and Russakovsky] Angelina Wang, Arvind Narayanan, and Olga Russakovsky. REVISE: A tool for measuring and mitigating bias in visual datasets. In ECCV, 2020a.
  • [Wang et al.(2019a)Wang, Deng, Hu, Tao, and Huang] Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, and Yaohai Huang. Racial faces in the wild: Reducing racial bias by information maximization adaptation network. In ICCV, 2019a.
  • [Wang et al.(2019b)Wang, Zhao, Yatskar, Chang, and Ordonez] Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, and Vicente Ordonez. Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In ICCV, 2019b.
  • [Wang et al.(2020b)Wang, Qinami, Karakozis, Genova, Nair, Hata, and Russakovsky] Zeyu Wang, Klint Qinami, Ioannis Christos Karakozis, Kyle Genova, Prem Nair, Kenji Hata, and Olga Russakovsky. Towards fairness in visual recognition: Effective strategies for bias mitigation. In CVPR, 2020b.
  • [Yang et al.(2020)Yang, Qinami, Fei-Fei, Deng, and Russakovsky] Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga Russakovsky. Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In FAccT, 2020.
  • [Yin et al.(2019)Yin, Yu, Sohn, Liu, and Chandraker] Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker.

    Feature transfer learning for face recognition with under-represented data.

    In CVPR, 2019.
  • [Zhang et al.(2018)Zhang, Lemoine, and Mitchell] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with adversarial learning. In AIES, 2018.
  • [Zhao et al.(2017)Zhao, Wang, Yatskar, Ordonez, and Chang] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In EMNLP, 2017.
  • [Zhao et al.(2018)Zhao, Zhou, Li, Wang, and Chang] Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang. Learning gender-neutral word embeddings. In EMNLP, 2018.
  • [Zhao et al.(2019)Zhao, Wang, Yatskar, Cotterell, Ordonez, and Chang] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. Gender bias in contextualized word embeddings. In NAACL, 2019.