Learning De-biased Representations with Biased Representations

10/07/2019 ∙ by Hyojin Bahng, et al. ∙ Korea University NAVER Corp. 25

Many machine learning algorithms are trained and evaluated by splitting data from a single source into training and test sets. While such focus on in-distribution learning scenarios has led interesting advances, it has not been able to tell if models are relying on dataset biases as shortcuts for successful prediction (e.g., using snow cues for recognising snowmobiles). Such biased models fail to generalise when the bias shifts to a different class. The cross-bias generalisation problem has been addressed by de-biasing training data through augmentation or re-sampling, which are often prohibitive due to the data collection cost (e.g., collecting images of a snowmobile on a desert) and the difficulty of quantifying or expressing biases in the first place. In this work, we propose a novel framework to train a de-biased representation by encouraging it to be different from a set of representations that are biased by design. This tactic is feasible in many scenarios where it is much easier to define a set of biased representations than to define and quantify bias. Our experiments and analyses show that our method discourages models from taking bias shortcuts, resulting in improved performances on de-biased test data.



There are no comments yet.


page 1

page 3

page 4

page 5

page 6

page 8

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Most machine learning algorithms are trained and evaluated by randomly splitting a single source of data into training and test sets. Although this is a standard protocol, it is blind to a critical problem: the existence of dataset bias (Torralba and Efros, 2011)

. For instance, many frog images are taken in swamp scenes, but swamp itself is not a frog. Nonetheless, a neural network will exploit this bias (i.e., take “shortcuts”) if it yields correct predictions for the majority of training examples. If bias is sufficient to achieve high accuracy, there is little motivation for models to learn the complexity of the intended task, despite its full capacity to do so. Consequently, a model that relies on bias will achieve high in-distribution accuracy, yet fail to generalise when the bias shifts.

We tackle this “cross-bias generalisation” problem where a model does not exploit its full capacity due to the “sufficiency” of bias cues for prediction of the target label in training data. For example, language models make predictions based on the presence of certain words (e.g., “not” for “contradiction”) (Gururangan et al., 2018)

without much reasoning on the actual meaning of sentences, even if they are in principle capable of sophisticated reasoning. Similarly, convolutional neural networks (CNNs) achieve high accuracies on image classification by using local texture cues as shortcut, as opposed to more reliable global shape cues 

(Geirhos et al., 2019; Brendel and Bethge, 2019).

Existing methods attempt to remove a model’s dependency on bias by de-biasing the training data through data augmentation (Geirhos et al., 2019) or re-sampling tactics (Li and Vasconcelos, 2019). Others have introduced a pre-defined set of biases that a model is trained to be independent against (Wang et al., 2019a). These prior works assume that bias can easily be defined or quantified, but real-world biases often do not (e.g., texture bias above).

To address this limitation, we propose a novel framework to train a de-biased representation by encouraging it to be “different” from a set of representations that are biased by design. Our insight is that biased representations can easily be obtained by utilising models of smaller capacity (e.g., bag of words for word bias and CNNs of small receptive fields for texture bias). Experiments show that our method is effective in reducing a model’s dependency on “shortcuts” in training data, as evidenced by improved accuracies in test data where the bias is either shifted or removed.

2 Problem Definition

We provide a rigorous definition of our over-arching goal: overcoming the bias in models trained on biased data. We show that the problem we tackle is novel and realistic.

2.1 Cross-bias generalisation

We first define random variables, signal

and bias as cues for the recognition of an input as certain target variable . Signals are the cues essential for the recognition of as ; examples include the shape and skin patterns of frogs for frog image classification. Biases ’s, on the other hand, are cues not essential for the recognition but correlated with the target ; many frog images are taken in swamp scenes, so swamp scenes can be considered as . A key property of is that intervening on should not change ; moving a frog from swamp to a dessert scene does not change the “frogness”. We assume that the true predictive distribution factorises as , signifying the sufficiency of for recognition.

Figure 1: Learning scenarios. Different distributional gaps may take place between training and test distributions. Our work is addressing the cross-bias generalisation problem. Background colours on the right three figures indicate the decision boundaries of models trained on given training data.

Under this framework, three learning scenarios are identified depending on the change of relationship across training and test distributions, and , respectively: in-distribution, cross-domain, and cross-bias generalisation. See Figure 1 for a summary.


. This is the standard learning setup utilised in many benchmarks by splitting data from a single source into training and test data at random.


and furthermore . in this case is often referred to as “domain”. For example, training data consist of images with (=frog, =wilderness) and (=bird, =wilderness), while test data contain (=frog, =indoors) and (=bird, =indoors). This scenario is typically simulated by training and testing on different datasets (Ben-David et al., 2007).


111 and denote independence and dependence, respectively. and the dependency changes across training and test distributions: . We further assume that , to clearly distinguish the scenario from the cross-domain generalisation. For example, training data only contain images of two types (=frog, =swamp) and (=bird, =sky), but test data contain unusual class-bias combinations (=frog, =sky) and (=bird, =swamp). Our work is addresses this scenario.

2.2 Existing cross-bias generalisation methods and their assumptions

Under cross-bias generalisation scenarios, the dependency makes bias a viable cue for recognition. The model trained on such data becomes susceptible to interventions on , limiting its generalisabililty when the bias is changed or removed in the test data. There exist prior approaches to this problem, but with different types and amounts of assumptions on . We briefly recap the approaches based on the assumptions they require. In the next part §2.3, we will define our novel problem setting that requires an assumption distinct from the ones in prior approaches.

When an algorithm to disentangle bias and signal exists.

Being able to disentangle and lets one collapse the feature space corresponding to in both training and test data. A model trained on such normalised data then becomes free of biases. As ideal as it is, building a model to perfectly disentangle and is often unrealistic.

When a generative algorithm or data collection procedure for exists.

When additional examples can be supplied through , the training dataset itself can be de-biased, i.e., . For example, one can either collect or synthesise unusual images like frogs in sky and birds in swamp to balance out the bias. Such a data augmentation strategy is indeed a valid solution adopted by many prior studies (Panda et al., 2018; Geirhos et al., 2019; Shetty et al., 2019). However, collecting unusual inputs can be expensive (Peyre et al., 2017), and building a generative model with pre-defined bias types (Geirhos et al., 2019) may suffer from bias mis-specification and the lack of realism.

When a predictive algorithm or ground truth for exists.

Conversely, when one can tell the bias for every input , two approaches are feasible. (1) The first is a data re-weighting solution: we give greater weights on frogs in sky than frogs in swamps to even out the correlation in  (Li et al., 2018; Li and Vasconcelos, 2019). (2) The second approach removes the dependency between the model predictions and the bias . Many existing approaches for fairness in machine learning have proposed independence-based regularisers to encourage  (Zemel et al., 2013) or the conditional independence (called the “separation” constraint, Hardt et al. (2016)). Other approaches have proposed to remove predictability of based on through domain adversarial losses (Li and Vasconcelos, 2019; Wang et al., 2019b) or projection (Wang et al., 2019b; Quadrianto et al., 2019).

The knowledge on is provided in many realistic scenarios. For example, when the aim is to remove gender biases in a job application process , applicants’ genders are supplied as ground truths. However, there exist cases when is difficult to even be defined or quantified but can only be indirectly specified. We tackle such a scenario in the next part.

2.3 Our scenario: Capturing bias with a particular set of models

Under the cross-bias generalisation scenario, certain types of biases are not easily addressed by the above methods. Take texture bias as an example (§1, Geirhos et al. (2019)): (1) texture and shape cannot easily be disentangled, (2) building a generative model or collecting unusual images is expensive, (3) building the predictive model

for texture requires enumeration (classifier) or embedding (regression) of all possible textures, which is not feasible.

However, slightly modifying the third assumption results in a problem setting that allows interesting application scenarios. Instead of assuming knowledge on , we characterise by defining a set of models that are biased towards by design. For texture biases, for example, we define to be the set of convolutional neural network (CNN) architectures with overall receptive fields. Then, any learned model can by design make predictions based on the patterns that can only be captured with small receptive fields (i.e., textures).

More precisely, we define to be a bias-characterising model class for the bias-signal pair

if for every possible joint distribution

there exists a such that (recall condition) and every satisfies (precision condition). In other words, contains all bias extractors except for ones that can also recognise signals. The above-mentioned family of CNN architectures with limited receptive fields exemplifies such .

There exist many scenarios when such is known. For example, 3D CNNs are developed for action recognition on video frames over using not only static but also temporal filters. Those models, however, may still be biased towards static features. The model may lazily use basketball court features to recognise playing basketball. By defining to be the set of 2D frame-wise CNNs, one can fully characterise such biases. Generally, this scenario exemplifies situations when the added architectural capacity is not fully utilised due to the sufficiency of simpler cues for solving the task in the given training set.

3 Proposed Method

We present a solution for the cross-bias generalisation when the bias-characterising model class is known (see §2.3); the method is referred to as REBI’S 222Abbreviation of “removing bias”; pronounced Levi’s.. The solution consists of training a model for the task with a regularisation term encouraging the independence between the prediction and the set of all possible biased predictions . We will introduce the precise definition of the regularisation term and discuss why and how it leads to the unbiased model.

3.1 Rebi’s: Removing bias with bias

If is fully known, we can directly encourage . Since we only have access to the set of biased models 2.3), we seek to promote for every . Simply put, we de-bias a representation by designing a set of biased models and letting run away from . This leads to the independence from bias cues (due to the recall condition of ) while leaving signal cues as valid recognition cues (due to the precision condition of ); see §2.3. We will specify REBI’S learning objective after introducing our independence criterion, HSIC.

Hilbert-Schmidt Independence Criterion (HSIC).

Since we need to measure degree of independence between continuous random variables

and in high dimensional spaces, it is infeasible to resort to histogram-based measures; we use HSIC (Gretton et al., 2005). For two random variables and and kernels and , HSIC is defined as where is the cross-covariance operator in the Reproducing Kernel Hilbert Spaces (RKHS) of and  (Gretton et al., 2005), an RKHS analogue of covariance matrices. is the Hilbert-Schmidt norm, a Hilbert-space analogue of the Frobenius norm. It is known that for two random variables and

and radial basis function (RBF) kernels

and , if and only if

. A finite-sample estimate

has been used in practice for statistical testing (Gretton et al., 2005, 2008), feature similarity measurement (Kornblith et al., 2019), and model regularisation (Quadrianto et al., 2019; Zhang et al., 2018). with samples is defined as where is a mean-subtracted matrix of pairwise kernel similarities among samples . is defined similarly.

Minimax optimisation for bias removal.

In our case, we compute


with an RBF kernel for the degree of independence between representation and the biased representations . We write and as shorthands. Since the problem allows trivial solutions , we use the canonical kernel alignment (CKA) (Shawe-Taylor et al., 2004; Kornblith et al., 2019) criterion defined by . The learning objective for is then defined as


where is the loss for the main task and . We consider replacing the inner optimisation with minimisation , while retaining the CKA regularisation for the outer optimisation for . Intuitively, minimisation poses a stronger similarity condition between and (in fact identity) than does , leading to better de-biasing performances (§4.2.3).

Independence versus separation.

The CKA regularisation in equation 2 encourages . This may lead to less stable optimisation as removing bias often increases the main task loss . To avoid contradicting objectives, we consider encouraging for discrete (e.g., classification) by computing the separation CKA, , where the term “separation” comes from the fairness literature describing this type of conditional independence (Hardt et al., 2016). The final learning objective for REBI’S is then


3.2 Why and how does it work?

Independence describes relationships between random variables, but we use it for function pairs. Which functional relationship does statistical independence translate to? In this part, we argue with proofs and observations that the answer to the above question is the dissimilarity of invariance types learned by a pair of models.

Linear case: Equivalence between independence and orthogonality.

We study the set of function pairs satisfying for suitable random variable . Assuming linearity of involved functions and the normality of , we obtain the equivalence between statistical independence and functional orthogonality.

Lemma 1.

Assume that and are affine mappings and where and . Assume further that

is a normal distribution with mean

and covariance matrix . Then, if and only if . For a positive semi-definite matrix , we define , and the set orthogonality likewise. Proof in §A.

In particular, when and

have 1-dimensional outputs, the independence condition is translated to the orthogonality of their weight vectors and decision boundaries. From a machine learning point of view,

and are models with orthogonal invariance types.

Non-linear case: HSIC as a metric learning objective.

We lack theories to fully characterise general, possibly non-linear, function pairs achieving ; it is an interesting open question. For now, we make a set of observations in this general case, using the finite-sample independence criterion , where is the mean-subtracted matrix and likewise for (see §3.1).

Note that is an inner product between flattened matrices and . We consider the inner product minimising solution for on an input pair given a fixed . The problem can be written as , which is equivalent to .

Suppose . It indicates a relative invariance of on , since . Then, the above problem boils down to , signifying the relative variance of on . Following a similar argument, we obtain the converse statement: if is relatively variant on a pair of inputs, invariance of on the pair minimises the objective.

We conclude that against a fixed is a metric-learning objective for the embedding , where ground truth pairwise matches and mismatches are relative mismatches and matches for , respectively. As a result, and learn different sorts of invariances.

Effect of HSIC regularisation on toy data.

We have established that HSIC regularisation encourages the difference in model invariances. To see how it helps to de-bias a model, we have prepared synthetic two-dimensional training data following the cross-domain generalisation case in Figure 1: and

. Since the training data is perfectly biased, a multi-layer perceptron (MLP) trained on the data only shows 55% accuracy on de-biased test data (see decision boundary figure in Appendix §

B). To overcome the bias, we have trained another MLP with equation 3 where the bias-characterising class is defined as the set of MLPs that take only the bias dimension as input. This model exhibits de-biased decision boundaries (Appendix §B) with improved accuracy of 89% on the de-biased test data.

4 Experiments

In the previous section, REBI’S has been introduced and theoretically justified. In this section, we present experimental results of REBI’S. We first introduce the setup, including the biases tackled in the experiments, difficulties inherent to the cross-bias evaluation, and the implementation details (§4.1). Results on Biased MNIST (§4.2

) and ImageNet

4.3) are shown afterwards.

4.1 Experimental setup

Which biases do we tackle?

There is a broad spectrum of bias types to be addressed under the cross-bias generalisation setting. Our work is targeting the biases that arise due to the existence of shortcut cues for recognition that are sufficient for recognition in training data. In the experiments, we tackle a representative bias of such type: “local pattern” biases for image classification. Even if a CNN image classifier has wide receptive fields, empirical evidence indicates that they resort to local patterns as opposed to global shape cues (Geirhos et al., 2019).

It is difficult to define and quantify all possible local pattern biases, but it is easy to capture it through a class of CNN architectures: those with smaller receptive fields. This is precisely the setting where we benefit from REBI’S.

Evaluating cross-bias generalisation is difficult.

To measure the performance of a model across real-world biases, one requires an unbiased dataset or one where the types and degrees of biases can be controlled. Unfortunately, data in real world arise with biases. To de-bias a frog and bird image dataset with swamp and sky backgrounds (see §2.1), either rare data samples must be collected (search for photos of a frog on sky) or one must intervene with the data generation process (throw a frog into the sky and take a photo). Either way, it is an expensive procedure (Peyre et al., 2017).

Preparing an unbiased data is feasible in some cases when the bias type is simple (e.g., collecting natural language corpus of unbiased gender pronouns, Webster et al. (2018)). However, we are addressing biases that are expressed in terms of a class of representations but perhaps are difficult to precisely express in language, such as texture bias of image classifiers (§2.3).

We thus evaluate our method along two axes: (1) Biased MNIST and (2) ImageNet. Biased MNIST contains synthetic biases (colour and texture) which we freely control in training and test data for in-depth analysis of REBI’S. In particular, we can measure its performance on perfectly unbiased test data. On ImageNet, we evaluate our method against realistic biases. Due to the difficulty of defining and obtaining bias labels on real images, we use proxy ground truths for the local pattern bias to measure the cross-bias generalisability. MNIST and ImageNet experiments complement each other in terms of experimental control and realism.

Implementation of Rebi’s.

We describe the specific design choices in REBI’S implementation (equation 3) in our experiments. We will open source the code and data.

To train a model that overcomes local pattern biases, we first define biased model architecture families such that they precisely and sufficiently encode biased representations: CNNs with relatively small receptive fields (RFs). The biased models in will by design learn to predict the target class of an image through only local cues. On the other hand, we define a larger search space with larger RFs for our unbiased representations.

In our work, all networks and are fully convolutional networks. and denote the final convolutional layer outputs (feature maps), on which we compute the independence measures like HSIC, CKA, and SCKA (§3.1). We perform global average pooling and a learnable linear classifier on , trained along with the outer optimisation, to compute the cross-entropy loss in equation 3.

For Biased MNIST, is the LeNet  (LeCun et al., 1998) architecture with RF 28. It has two 5-convs333We refer to convolutional layers with kernels as -conv.

, after each of which max-pooling layers of

kernels are applied, followed by three linear layers. has the same number of layers but either with all convolutional layers of kernels and without max-pooling operations, called BlindNet1, or with one 3-conv and one 1-conv, called BlindNet3, each with a RFs of 1 and 3, respectively. On ImageNet, we use ResNet (He et al., 2016) architecture for (either ResNet18 with RF of 435 or ResNet50 with RF of 427). is defined as BagNet (Brendel and Bethge, 2019) architectures with the same depth as ResNet’s (either BagNet18 with RF of 43 or BagNet50 with RF of 91). More implementation details are provided in §C.

4.2 Biased MNIST

We first verify our model on a dataset where we have full control over the type and amount of bias during training and evaluation. We describe the dataset and present the experimental results.

Figure 2: Biased MNIST. We construct a synthetic dataset with two types of biases – colour and texture – which highly correlate with the label during training. Upper row: colour bias. Lower row: colour and texture biases.

4.2.1 Dataset and evaluation

We construct a new dataset called Biased MNIST designed to measure the extent to which models generalise to bias shift. We modify MNIST (LeCun et al., 1998) by introducing two types of bias – colour and texture – that highly correlate with the label during training. With alone, a CNN can achieve high accuracy without having to learn inherent signals for digit recognition , such as shape, providing little motivation for the model to learn beyond these superficial cues.

We inject colour and texture biases by adding colour or texture patterns on training image backgrounds (see Figure 2). We pre-select 10 distinct colour or texture patterns for each digit . Then, for each image of digit , we assign the pre-defined pattern

with probability

and any other pattern (pre-defined for other digits) with probability . then controls the bias-target correlation in the training data: leads to complete bias and leads to an unbiased dataset. We consider two datasets: Single-bias MNIST with only colour bias and Multi-bias MNIST with both colour and texture biases ( in all the experiments).

We evaluate the model’s generalisability to bias shift by evaluating under the following criterion:


(an in-distribution case in §2.1). Whatever bias the training set contains, it is replicated in the test set. This measures the ability of de-biased models to maintain high in-distribution performance while generalising to unbiased settings.


. We assign biases on test images independently of the labels. Bias is no longer predictive of and a model needs to utilise actual signals to yield correct predictions.

We have additional fine-grained measures on Multi-bias MNIST: removed colour bias (colour; texture bias remains) and removed texture bias (texture; colour bias remains) cases. Colour and texture biases are marginalised out in the test set, respectively, to factorise generalisability across different types of bias.

Single-bias MNIST Multi-bias MNIST
Model Description biased unbiased biased colour texture unbiased
Vanilla 100. 27.5 100. 45.1 85.8 29.6
Biased 100. 10.8 100. 10.0 58.5 4.7
Wang et al. (2019a) 97.2 14.7 95.0 31.1 91.1 18.1
REBI’S (ours) 96.1 74.9 94.7 91.9 90.9 88.6
Table 1: Biased MNIST results. Architecture families are set as LeNet and BlindNet1 for Single-bias and BlindNet3 for Multi-bias MNIST. Accuracy results are shown.

4.2.2 Results

Results on Single- and Multi-bias MNIST are shown in Table 1.

Rebi’s lets a model overcome bias.

We observe that vanilla LeNet achieves 100% accuracy under the “biased” metric (the same bias between training and test data) in Single- and Multi-bias MNIST. This is how most machine learning tasks are evaluated, yet this does not show the extent to which the model depends on bias for prediction. When the bias cues are randomly assigned to the label at evaluation, vanilla LeNet accuracy collapses to 27.5% and 29.6% under the “unbiased” metric on Single- and Multi-bias MNIST, respectively. The intentionally biased BlindNet models result in an even lower accuracy of 10.8% on Single-bias MNIST, close to the random chance 10%. This reveals that the seemingly high-performing model has in fact overfitted to bias and has not learned beyond this fallible strategy.

REBI’S, on the other hand, achieves robust generalisation across all settings by learning to be different from BlindNet representations . REBI’S achieves a +47.4 pp (Single) and +59.0 pp (Multi) higher performances than the vanilla model under the cross-bias generalisation setup (the unbiased metric), with a slight degradation in original accuracies (-3.9 pp and -5.3 pp, respectively).

Comparison against Hex.

Previously, HEX (Wang et al., 2019a) has attempted to reduce the dependency of a model on “superficial statistics”, or high-frequency textural information. HEX measures texture via neural grey-level co-occurrence matrices (NGLCM) and projects out the NGLCM feature from the output of the model of interest. We observe that , where HEX is applied on , is effective in removing texture biases (85.5% to 91.1% cross-texture accuracy), but still vulnerable to colour biases (accuracy drops from 45.1% to 31.1%). Hand-crafting texture features as done by HEX has resulted in its limited applicability beyond the hand-crafted bias type. By designing the model family architecture, instead of a specific feature extractor, REBI’S achieved a representation free of broader types of biases.

4.2.3 Factor analysis

Some design choices exist, leading to our final model (§3.1). We examine how the factors contribute to the final performance. See Table 2 for ablative studies on the Single-bias MNIST.

Indep. biased unbiased
HSIC 83.5 51.1
CKA 97.4 22.0
SCKA 96.1 74.9
Inner opt. biased unbiased
SCKA 95.8 62.4
96.1 74.9
Updating biased unbiased
Fixed 94.6 44.7
Multiple 95.9 58.1
Updated 96.1 74.9
Table 2: Factor analysis. Default REBI’S parameters: last rows.
Impact of the independence criterion.

Three independence measures, HSIC, CKA, and SCKA, have been considered. SCKA, the separation CKA, used in REBI’S, shows a superior de-biasing performance (74.9%) against baseline choices. It improves upon HSIC by avoiding the trivial solution (a constant function), and upon CKA via more stable optimisation due to the milder conditional independence that does not contradict the classification objective.

Impact of the objective in the inner optimisation.

We then study the effect of our choice to replace the inner SCKA optimisation with the objective. is considered, as it poses a stronger convergence condition than SCKA does. We confirm indeed that the objective results in a better de-biasing performance.

Impact of updating .

The advantage of specifying a class of models instead of a single, fixed model is that can be computed more precisely (§3.1). We quantify this benefit. Fixing the biased representation to results in sub-optimal de-biasing performance, 44.7%. By including multiple fixed biased models , de-biasing improves to 58.1%, but is not as good as the updated case, 74.9%. It is thus important to precisely compute the representation-to-set independence. More detailed analysis around the receptive field sizes of models in is in Appendix §E.

4.3 ImageNet

In ImageNet experiments, we further validate the applicability of REBI’S on the local pattern bias in realistic images (i.e,, objects in natural scenes). The local pattern bias often lets a model achieve good in-distribution performances by exploiting the local cue shortcuts (e.g, determining a turtle class by not seeing its shape but the background texture).

4.3.1 Dataset and evaluation

We construct 9-Class ImageNet, a subset of ImageNet (Russakovsky et al., 2015) containing 9 super-classes as done in Ilyas et al. (2019), since full-scale analysis on ImageNet is not scalable. We additionally balance the ratios of sub-class images for each super-class to solely focus on the effect of the local pattern bias.

Since it is difficult to evaluate cross-bias generalisability on realistic unbiased data (§4.1), we settle for alternative evaluations:


Accuracy is measured on the in-distribution validation set. Though widely-used, this metric is blind to a model’s generalisability to unseen bias-target combinations.


As a proxy to the perfectly de-biased test data, which is difficult to collect (§4.1), we use texture clusters IDs as the ground truth labels for local pattern bias. For full details of texture clustering algorithm, see Appendix §F. For an unbiased accuracy measurement, we compute accuracies for every set of images corresponding to a target-texture combination . The combination-wise accuracy is computed by , where is the number of correctly predicted samples in and is the total number of samples in , called the population at . The unbiased accuracy is then the mean accuracy over all where the population is non-zero . This measure gives more weights on samples of unusual texture-class combinations (smaller ) that are less represented in the usual biased accuracies. Under this unbiased metric, a biased model basing its recognition on textures is likely to show sub-optimal results on unusual combinations, leading to a drop in the unbiased accuracy.


ImageNet-A (Hendrycks et al., 2019) contains failure cases of ImageNet pre-trained ResNet50 among web images. The images consist of many failure modes of networks when “frequently appearing background elements” become erroneous cues for recognition (e.g. a bee image feeding on hummingbird feeder is recognised as a hummingbird). Improved performance on ImageNet-A is an indirect signal that the model learns beyond the bias shortcuts.

4.3.2 Results

We measure performances of ResNet18 and ResNet50, each trained to be different from BagNet18 and BagNet50 respectively using REBI’S, under the metrics in the previous part. Results are shown in Table 3.

Figure 3: Qualitative results. Common and uncommon images are shown according to class-texture relationship. Predictions of ResNet18 and REBI’S are shown as well.
Vanilla models are biased.

Both ResNet18 and ResNet50 show good performances on the biased accuracy (93.3% and 91.7%, respectively), but dropped performances on the texture-unbiased accuracies (85.8% and 78.3%, respectively). The drop signifies the biases of vanilla models towards texture cues; by basing their predictions on texture cues they obtain generally better accuracies on texture-class pairs that are represented more. The drop also shows the limitation of current evaluation schemes where cross-bias generalisation is not measured.

Rebi’s leads to less biased models.

When REBI’S is applied on ResNet18 and ResNet50 to encourage them to unlearn cues learnable by BagNet18 and BagNet50, respectively, we observe general boost in unbiased accuracies. ResNet18 improves from 85.8% to 88.4%; ResNet50 from 78.3% to 89.2%. Our method thus robustly generalises to less represented texture-target combinations at test time. We observe that our method also shows improvement on the challenging ImageNet-A subset (e.g. from 8.9% to 13.6% for ResNet50), which further shows our superiority on generalisability to unusual texture-class combinations. More detailed analysis on per-texture and per-class accuracies are included in Appendix §G; learning curves for the baseline and REBI’S are in Appendix §H.

Model Description Biased Unbiased IN-A
ResNet18 93.3 85.8 8.1
REBI’S 93.7 88.4 11.5
ResNet50 91.7 78.3 8.9
REBI’S 88.7 89.2 13.6
Table 3: ImageNet results. We show results with and corresponding to ( is ResNet18 and is BagNet18) and ( is ResNet50 and is BagNet50). IN-A indicates ImageNet-A.
Qualitative analysis.

We qualitatively present the cases where our method successfully de-biases a texture-target dependency. Figure 3 shows examples of common and uncommon texture-target combinations for “grass” and “close-up” texture clusters. The shown uncommon instances are the ones ResNet18 has incorrectly predicted the class. For example, crab in the grass has been predicted as turtle, presumably because turtles co-occur a lot with grass backgrounds in training data. On the other hand, REBI’S (ours) robustly generalises to unusual texture-class combinations.

5 Conclusion

We have identified a practical problem faced by many machine learning algorithms that the learned models exploit bias shortcuts to recognise the target (cross-bias generalisation problem in §2). In particular, models tend to under-utilise its capacity to extract non-bias signals (e.g. global shapes for object recognition) when bias shortcuts provide sufficient cues for recognition in the training data (e.g. local patterns and background cues for object recognition) (Geirhos et al., 2019). We have addressed this problem with the REBI’S framework, which does not rely on expensive, if not infeasible, training data de-biasing schemes. Given an identified set of models that encodes the bias to be removed, REBI’S encourages a model to be statistically independent of 3). We have provided theoretical justifications for the use of statistical independence in §3.2, and have validated the superiority of REBI’S in removing biases from models through experiments on modified MNIST and ImageNet (§4).


  • S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira (2007) Analysis of representations for domain adaptation. In Advances in neural information processing systems, pp. 137–144. Cited by: §2.1.
  • W. Brendel and M. Bethge (2019) Approximating CNNs with bag-of-local-features models works surprisingly well on imagenet. In International Conference on Learning Representations, External Links: Link Cited by: Appendix C, §1, §4.1.
  • L. Gatys, A. S. Ecker, and M. Bethge (2015) Texture synthesis using convolutional neural networks. In Advances in neural information processing systems, pp. 262–270. Cited by: Appendix F.
  • R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel (2019) ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness.. In International Conference on Learning Representations, External Links: Link Cited by: §1, §1, §2.2, §2.3, §4.1, §5.
  • A. Gretton, O. Bousquet, A. Smola, and B. Schölkopf (2005) Measuring statistical dependence with hilbert-schmidt norms. In International conference on algorithmic learning theory, pp. 63–77. Cited by: §3.1.
  • A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. J. Smola (2008) A kernel statistical test of independence. In Advances in neural information processing systems, pp. 585–592. Cited by: §3.1.
  • S. Gururangan, S. Swayamdipta, O. Levy, R. Schwartz, S. Bowman, and N. A. Smith (2018) Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, Louisiana, pp. 107–112. External Links: Link, Document Cited by: §1.
  • M. Hardt, E. Price, N. Srebro, et al. (2016)

    Equality of opportunity in supervised learning

    In Advances in neural information processing systems, pp. 3315–3323. Cited by: §2.2, §3.1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 770–778. Cited by: §4.1.
  • D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, and D. Song (2019) Natural adversarial examples. External Links: 1907.07174 Cited by: §4.3.1.
  • A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry (2019) Adversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175. Cited by: §4.3.1.
  • J. Johnson, A. Alahi, and L. Fei-Fei (2016)

    Perceptual losses for real-time style transfer and super-resolution

    In European conference on computer vision, pp. 694–711. Cited by: Appendix F.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Appendix C.
  • S. Kornblith, M. Norouzi, H. Lee, and G. Hinton (2019) Similarity of neural network representations revisited. In International Conference on Machine Learning, Cited by: §3.1, §3.1.
  • Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.1, §4.2.1.
  • Y. Li and N. Vasconcelos (2019) REPAIR: removing representation bias by dataset resampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9572–9581. Cited by: §1, §2.2.
  • Y. Li, Y. Li, and N. Vasconcelos (2018) RESOUND: towards action recognition without representation bias. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 513–528. Cited by: §2.2.
  • R. Panda, J. Zhang, H. Li, J. Lee, X. Lu, and A. K. Roy-Chowdhury (2018) Contemplating visual emotions: understanding and overcoming dataset bias. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 579–595. Cited by: §2.2.
  • J. Peyre, J. Sivic, I. Laptev, and C. Schmid (2017) Weakly-supervised learning of visual relations. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5179–5188. Cited by: §2.2, §4.1.
  • N. Quadrianto, V. Sharmanska, and O. Thomas (2019) Discovering fair representations in the data domain. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8227–8236. Cited by: §2.2, §3.1.
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115 (3), pp. 211–252. Cited by: §4.3.1.
  • J. Shawe-Taylor, N. Cristianini, et al. (2004) Kernel methods for pattern analysis. Cambridge university press. Cited by: §3.1.
  • R. Shetty, B. Schiele, and M. Fritz (2019) Not using the car to see the sidewalk–quantifying and controlling the effects of context in classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8218–8226. Cited by: §2.2.
  • K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: Appendix F.
  • A. Torralba and A. A. Efros (2011) Unbiased look at dataset bias.. In CVPR, Vol. 1, pp. 7. Cited by: §1.
  • H. Wang, Z. He, Z. C. Lipton, and E. P. Xing (2019a) Learning robust representations by projecting superficial statistics out. ICLR. Cited by: §1, §4.2.2, Table 1.
  • H. Wang, Z. He, and E. P. Xing (2019b) Learning robust representations by projecting superficial statistics out. In International Conference on Learning Representations, External Links: Link Cited by: §2.2.
  • K. Webster, M. Recasens, V. Axelrod, and J. Baldridge (2018) Mind the gap: a balanced corpus of gendered ambiguous pronouns. Transactions of the Association for Computational Linguistics 6, pp. 605–617. Cited by: §4.1.
  • R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork (2013) Learning fair representations. In International Conference on Machine Learning, pp. 325–333. Cited by: §2.2.
  • C. Zhang, Y. Liu, Y. Liu, Q. Hu, X. Liu, and P. Zhu (2018) FISH-mml: fisher-hsic multi-view metric learning.. In IJCAI, pp. 3054–3060. Cited by: §3.1.

Appendix A Statistical Independence is Equivalent to Functional Orthogonality for Linear Maps.

We provide a proof for the following lemma in §3.2.

Lemma 1.

Assume that and are affine mappings and where and . Assume further that is a normal distribution with mean and covariance matrix . Then, if and only if . For a positive semi-definite matrix , we define , and the set orthogonality likewise.


Due to linearity and normality, the independence is equivalent to the covariance condition . Covariance is computed as:


Note that


Appendix B Decision Boundary Visualisation for Toy Experiment

We show the decision boundaries of the toy experiment in §3.2. See Figure 4. GIF animations of decision boundary changes over training can be found at anonymous link.

Figure 4: Decision boundaries. Left to right: training data, baseline model, and our model.

Appendix C Implementation Details

We solve the minimax problem in equation 3

through alternating stochastic gradient descents (ADAM 

Kingma and Ba (2014)

), where we alternate between 5 epochs for the outer problem and 5 epochs for the inner one. The regularisation parameter

is set to for CKA and for minimisation on MNIST and for ImageNet. Note that we use a large for MNIST as the degree of synthetic bias is excessively large (bias-target correlation set to ) compared to the degree of bias in realistic settings. We have used batch sizes ( for ResNet50) and learning rates with linear decay. We train all models for 80 epochs for Biased MNIST. For 9-class ImageNet, we train ResNet18 for 200 epochs and ResNet50 for 100 epochs.

BagNet18 and BagNet50 have 18 and 50 convolutional layers which are the same structure (Basic or Bottleneck blocks) as ResNet18 and ResNet50, respectively. The internal kernel sizes of BagNets are set to following the original paper’s philosophy Brendel and Bethge (2019).

Appendix D Performance by bias and label on Biased MNIST.

In the Biased MNIST experiments (§4.2), we have shown either biased or unbiased set statistics. In this section, we provide more detailed results where model accuracies are computed per bias class and per target label . We visualise the case-wise accuracies of the baseline LeNet and the REBI’S trained in Figure 5. The diagonals in each matrix indicate the pre-defined bias-target pair (§4.2.1). Thus, the biased accuracies can be computed by taking the mean of diagonal entries in each matrix and the unbiased accuracies through the mean of all entries in each matrix.

The vanilla LeNet’s tendency to have higher accuracies on diagonal entries and near-zero performances on many other off-diagonal entries indicate the fact that LeNet is relying a lot on the colour (bias) cues. REBI’S successfully resolves this tendency in the vanilla model, exhibiting more uniform performances across all bias-target pairs . Note that accuracies below the main diagonal are relatively high as they are the classes that pre-defined patterns are assigned to at probability . Figure 6 demonstrates that our method is successful across different degrees of bias in a given training data.

(a) Vanilla trained LeNet.
(b) REBI’S trained.
Figure 5: Bias-target-wise accuracies. We show accuracies for each bias and target pair in Single-bias MNIST.
Figure 6: Impact of . REBI’S is effective in de-biasing across different degree of bias in data.

Appendix E Impact of Receptive Fields of for Biased MNIST

Figure 7: Receptive fields of . Biased and unbiased accuracies of REBI’S with LeNet and BlindNet with varying receptive fields.

It is conceptually important to design a the set of biased models that encode the bias

as precisely as possible. See precision and recall conditional for

in §2.3). To see if this is empirically true, we have measured the performance of REBI’S with LeNet and BlindNet, where the BlindNet receptive fields are controlled by replacing convolutional layers with convolutions, resulting in receptive fields . The receptive field indicate the case when LeNet is used as . We measure the biased and unbiased set performances on Single-bias MNIST (§4.2.1).

Results are shown in Figure 7. We observe relatively stable biased set performances and decreasing unbiased set performances. The decrease in de-biasing ability is attributed to the violation of the precision condition for 2.3): receptive fields are sufficient to capture any colour bias variations, and de-biasing against models of larger receptive fields will further make not see the meaningful signal (cues beyond -expressible colours). In the extreme case, when LeNet is trained to be independent against itself LeNet, the de-biasing performance drops significantly.

Appendix F Texture Clustering

In our ImageNet experiments (§4.3), we obtained proxy ground truths for the local pattern bias using texture clustering. We extract texture information from images by clustering the gram matrices of low-layer feature maps as done in standard texturisation methods (Gatys et al., 2015; Johnson et al., 2016); we use feature maps from layer relu1_2 of a pre-trained VGG16 (Simonyan and Zisserman, 2014). As we intend to evaluate whether a given model is biased towards local pattern cues, we only utilise features from the lower layer encoding lower layer features like edges and colours rather than high-level semantics. Figure 8 shows that each cluster effectively captures similar local patterns across different classes. For each cluster, we visualise its top-3 correlated classes. We can see that certain classes share a common texture: cat, monkey, and dog share a “face”-like texture. If a certain class is biased towards a particular texture during training, a model can take the shortcut to utilise texture cues for recognising the target class, leading to a sub-optimal cross-bias generalisation to unusual class-texture combinations (e.g. crab on grass).

Figure 8: Texture-class correlation. We show samples from each texture cluster. For each cluster, we visualise its top-3 correlated classes in rows.
Figure 9: More clustering samples. Extended version of Figure 8.
Figure 10: Data and model bias. Rows correspond to image labels and columns correspond to texture clusters. Cells are colour-coded according to the population of samples of corresponding texture-class pair. In each cell, ResNet18 and REBI’S accuracies are shown in pairs.

Appendix G Bias in data and models

We study the bias in the 9-class ImageNet data and the models trained on them (§4.3). Figure 10 shows the statistics of texture biases in data and the model biases that result from them. To measure the dataset bias, we empirically observe correlations between texture and target classes by counting the number of samples for each texture-class pair (denoted “population” in the main paper). We observe indeed that there does exist a strong correlation between texture clusters and class labels. We say that a class has a dominant texture cluster if the largest cluster for the class contains more than half of the class samples. 6 out of 9 classes considered has the dominant texture cluster: (“Dog”, “Dog”), (“Cat”, “Face”), (“Monkey”, “Mammal”), (“Frog”, “Spotted”), (“Crab”, “Shell”) and (“Insect”, “Close-up”).

Figure 10 further shows the accuracies of the baseline ResNet18 and REBI’S to indicate the presence of bias in the models, and how REBI’S overcomes the bias despite the bias in data itself. We measure the average of accuracies in classes with dominant texture clusters (biased classes) and the average in less biased classes. We observe that ResNet18 shows higher accuracy on biased classes () than on less biased classes (), signifying its bias towards texture. On the other hand, REBI’S achieves similar accuracies (biased classes) and (unbiased classes). We stress that REBI’S overcomes the bias even if the training data itself is biased.

Appendix H Learning Curves on ImageNet

We visualise the learning curves for the baseline vanilla ResNet50 and REBI’S trained ResNet50 against BagNet50 in Figure 11. We observe that REBI’S gradually de-biases a representation beyond the baseline model.

Figure 11: Learning curve. De-biased ImageNet accuracies of vanilla ResNet50 and REBI’S trained against BagNet50.