1 Introduction
Most machine learning algorithms are trained and evaluated by randomly splitting a single source of data into training and test sets. Although this is a standard protocol, it is blind to a critical problem: the existence of dataset bias (Torralba and Efros, 2011)
. For instance, many frog images are taken in swamp scenes, but swamp itself is not a frog. Nonetheless, a neural network will exploit this bias (i.e., take “shortcuts”) if it yields correct predictions for the majority of training examples. If bias is sufficient to achieve high accuracy, there is little motivation for models to learn the complexity of the intended task, despite its full capacity to do so. Consequently, a model that relies on bias will achieve high indistribution accuracy, yet fail to generalise when the bias shifts.
We tackle this “crossbias generalisation” problem where a model does not exploit its full capacity due to the “sufficiency” of bias cues for prediction of the target label in training data. For example, language models make predictions based on the presence of certain words (e.g., “not” for “contradiction”) (Gururangan et al., 2018)
without much reasoning on the actual meaning of sentences, even if they are in principle capable of sophisticated reasoning. Similarly, convolutional neural networks (CNNs) achieve high accuracies on image classification by using local texture cues as shortcut, as opposed to more reliable global shape cues
(Geirhos et al., 2019; Brendel and Bethge, 2019).Existing methods attempt to remove a model’s dependency on bias by debiasing the training data through data augmentation (Geirhos et al., 2019) or resampling tactics (Li and Vasconcelos, 2019). Others have introduced a predefined set of biases that a model is trained to be independent against (Wang et al., 2019a). These prior works assume that bias can easily be defined or quantified, but realworld biases often do not (e.g., texture bias above).
To address this limitation, we propose a novel framework to train a debiased representation by encouraging it to be “different” from a set of representations that are biased by design. Our insight is that biased representations can easily be obtained by utilising models of smaller capacity (e.g., bag of words for word bias and CNNs of small receptive fields for texture bias). Experiments show that our method is effective in reducing a model’s dependency on “shortcuts” in training data, as evidenced by improved accuracies in test data where the bias is either shifted or removed.
2 Problem Definition
We provide a rigorous definition of our overarching goal: overcoming the bias in models trained on biased data. We show that the problem we tackle is novel and realistic.
2.1 Crossbias generalisation
We first define random variables, signal
and bias as cues for the recognition of an input as certain target variable . Signals are the cues essential for the recognition of as ; examples include the shape and skin patterns of frogs for frog image classification. Biases ’s, on the other hand, are cues not essential for the recognition but correlated with the target ; many frog images are taken in swamp scenes, so swamp scenes can be considered as . A key property of is that intervening on should not change ; moving a frog from swamp to a dessert scene does not change the “frogness”. We assume that the true predictive distribution factorises as , signifying the sufficiency of for recognition.Under this framework, three learning scenarios are identified depending on the change of relationship across training and test distributions, and , respectively: indistribution, crossdomain, and crossbias generalisation. See Figure 1 for a summary.
Indistribution.
. This is the standard learning setup utilised in many benchmarks by splitting data from a single source into training and test data at random.
Crossdomain.
and furthermore . in this case is often referred to as “domain”. For example, training data consist of images with (=frog, =wilderness) and (=bird, =wilderness), while test data contain (=frog, =indoors) and (=bird, =indoors). This scenario is typically simulated by training and testing on different datasets (BenDavid et al., 2007).
Crossbias.
^{1}^{1}1 and denote independence and dependence, respectively. and the dependency changes across training and test distributions: . We further assume that , to clearly distinguish the scenario from the crossdomain generalisation. For example, training data only contain images of two types (=frog, =swamp) and (=bird, =sky), but test data contain unusual classbias combinations (=frog, =sky) and (=bird, =swamp). Our work is addresses this scenario.
2.2 Existing crossbias generalisation methods and their assumptions
Under crossbias generalisation scenarios, the dependency makes bias a viable cue for recognition. The model trained on such data becomes susceptible to interventions on , limiting its generalisabililty when the bias is changed or removed in the test data. There exist prior approaches to this problem, but with different types and amounts of assumptions on . We briefly recap the approaches based on the assumptions they require. In the next part §2.3, we will define our novel problem setting that requires an assumption distinct from the ones in prior approaches.
When an algorithm to disentangle bias and signal exists.
Being able to disentangle and lets one collapse the feature space corresponding to in both training and test data. A model trained on such normalised data then becomes free of biases. As ideal as it is, building a model to perfectly disentangle and is often unrealistic.
When a generative algorithm or data collection procedure for exists.
When additional examples can be supplied through , the training dataset itself can be debiased, i.e., . For example, one can either collect or synthesise unusual images like frogs in sky and birds in swamp to balance out the bias. Such a data augmentation strategy is indeed a valid solution adopted by many prior studies (Panda et al., 2018; Geirhos et al., 2019; Shetty et al., 2019). However, collecting unusual inputs can be expensive (Peyre et al., 2017), and building a generative model with predefined bias types (Geirhos et al., 2019) may suffer from bias misspecification and the lack of realism.
When a predictive algorithm or ground truth for exists.
Conversely, when one can tell the bias for every input , two approaches are feasible. (1) The first is a data reweighting solution: we give greater weights on frogs in sky than frogs in swamps to even out the correlation in (Li et al., 2018; Li and Vasconcelos, 2019). (2) The second approach removes the dependency between the model predictions and the bias . Many existing approaches for fairness in machine learning have proposed independencebased regularisers to encourage (Zemel et al., 2013) or the conditional independence (called the “separation” constraint, Hardt et al. (2016)). Other approaches have proposed to remove predictability of based on through domain adversarial losses (Li and Vasconcelos, 2019; Wang et al., 2019b) or projection (Wang et al., 2019b; Quadrianto et al., 2019).
The knowledge on is provided in many realistic scenarios. For example, when the aim is to remove gender biases in a job application process , applicants’ genders are supplied as ground truths. However, there exist cases when is difficult to even be defined or quantified but can only be indirectly specified. We tackle such a scenario in the next part.
2.3 Our scenario: Capturing bias with a particular set of models
Under the crossbias generalisation scenario, certain types of biases are not easily addressed by the above methods. Take texture bias as an example (§1, Geirhos et al. (2019)): (1) texture and shape cannot easily be disentangled, (2) building a generative model or collecting unusual images is expensive, (3) building the predictive model
for texture requires enumeration (classifier) or embedding (regression) of all possible textures, which is not feasible.
However, slightly modifying the third assumption results in a problem setting that allows interesting application scenarios. Instead of assuming knowledge on , we characterise by defining a set of models that are biased towards by design. For texture biases, for example, we define to be the set of convolutional neural network (CNN) architectures with overall receptive fields. Then, any learned model can by design make predictions based on the patterns that can only be captured with small receptive fields (i.e., textures).
More precisely, we define to be a biascharacterising model class for the biassignal pair
if for every possible joint distribution
there exists a such that (recall condition) and every satisfies (precision condition). In other words, contains all bias extractors except for ones that can also recognise signals. The abovementioned family of CNN architectures with limited receptive fields exemplifies such .There exist many scenarios when such is known. For example, 3D CNNs are developed for action recognition on video frames over using not only static but also temporal filters. Those models, however, may still be biased towards static features. The model may lazily use basketball court features to recognise playing basketball. By defining to be the set of 2D framewise CNNs, one can fully characterise such biases. Generally, this scenario exemplifies situations when the added architectural capacity is not fully utilised due to the sufficiency of simpler cues for solving the task in the given training set.
3 Proposed Method
We present a solution for the crossbias generalisation when the biascharacterising model class is known (see §2.3); the method is referred to as REBI’S ^{2}^{2}2Abbreviation of “removing bias”; pronounced Levi’s.. The solution consists of training a model for the task with a regularisation term encouraging the independence between the prediction and the set of all possible biased predictions . We will introduce the precise definition of the regularisation term and discuss why and how it leads to the unbiased model.
3.1 Rebi’s: Removing bias with bias
If is fully known, we can directly encourage . Since we only have access to the set of biased models (§2.3), we seek to promote for every . Simply put, we debias a representation by designing a set of biased models and letting run away from . This leads to the independence from bias cues (due to the recall condition of ) while leaving signal cues as valid recognition cues (due to the precision condition of ); see §2.3. We will specify REBI’S learning objective after introducing our independence criterion, HSIC.
HilbertSchmidt Independence Criterion (HSIC).
Since we need to measure degree of independence between continuous random variables
and in high dimensional spaces, it is infeasible to resort to histogrambased measures; we use HSIC (Gretton et al., 2005). For two random variables and and kernels and , HSIC is defined as where is the crosscovariance operator in the Reproducing Kernel Hilbert Spaces (RKHS) of and (Gretton et al., 2005), an RKHS analogue of covariance matrices. is the HilbertSchmidt norm, a Hilbertspace analogue of the Frobenius norm. It is known that for two random variables andand radial basis function (RBF) kernels
and , if and only if. A finitesample estimate
has been used in practice for statistical testing (Gretton et al., 2005, 2008), feature similarity measurement (Kornblith et al., 2019), and model regularisation (Quadrianto et al., 2019; Zhang et al., 2018). with samples is defined as where is a meansubtracted matrix of pairwise kernel similarities among samples . is defined similarly.Minimax optimisation for bias removal.
In our case, we compute
(1) 
with an RBF kernel for the degree of independence between representation and the biased representations . We write and as shorthands. Since the problem allows trivial solutions , we use the canonical kernel alignment (CKA) (ShaweTaylor et al., 2004; Kornblith et al., 2019) criterion defined by . The learning objective for is then defined as
(2) 
where is the loss for the main task and . We consider replacing the inner optimisation with minimisation , while retaining the CKA regularisation for the outer optimisation for . Intuitively, minimisation poses a stronger similarity condition between and (in fact identity) than does , leading to better debiasing performances (§4.2.3).
Independence versus separation.
The CKA regularisation in equation 2 encourages . This may lead to less stable optimisation as removing bias often increases the main task loss . To avoid contradicting objectives, we consider encouraging for discrete (e.g., classification) by computing the separation CKA, , where the term “separation” comes from the fairness literature describing this type of conditional independence (Hardt et al., 2016). The final learning objective for REBI’S is then
(3) 
3.2 Why and how does it work?
Independence describes relationships between random variables, but we use it for function pairs. Which functional relationship does statistical independence translate to? In this part, we argue with proofs and observations that the answer to the above question is the dissimilarity of invariance types learned by a pair of models.
Linear case: Equivalence between independence and orthogonality.
We study the set of function pairs satisfying for suitable random variable . Assuming linearity of involved functions and the normality of , we obtain the equivalence between statistical independence and functional orthogonality.
Lemma 1.
Assume that and are affine mappings and where and . Assume further that
is a normal distribution with mean
and covariance matrix . Then, if and only if . For a positive semidefinite matrix , we define , and the set orthogonality likewise. Proof in §A.In particular, when and
have 1dimensional outputs, the independence condition is translated to the orthogonality of their weight vectors and decision boundaries. From a machine learning point of view,
and are models with orthogonal invariance types.Nonlinear case: HSIC as a metric learning objective.
We lack theories to fully characterise general, possibly nonlinear, function pairs achieving ; it is an interesting open question. For now, we make a set of observations in this general case, using the finitesample independence criterion , where is the meansubtracted matrix and likewise for (see §3.1).
Note that is an inner product between flattened matrices and . We consider the inner product minimising solution for on an input pair given a fixed . The problem can be written as , which is equivalent to .
Suppose . It indicates a relative invariance of on , since . Then, the above problem boils down to , signifying the relative variance of on . Following a similar argument, we obtain the converse statement: if is relatively variant on a pair of inputs, invariance of on the pair minimises the objective.
We conclude that against a fixed is a metriclearning objective for the embedding , where ground truth pairwise matches and mismatches are relative mismatches and matches for , respectively. As a result, and learn different sorts of invariances.
Effect of HSIC regularisation on toy data.
We have established that HSIC regularisation encourages the difference in model invariances. To see how it helps to debias a model, we have prepared synthetic twodimensional training data following the crossdomain generalisation case in Figure 1: and
. Since the training data is perfectly biased, a multilayer perceptron (MLP) trained on the data only shows 55% accuracy on debiased test data (see decision boundary figure in Appendix §
B). To overcome the bias, we have trained another MLP with equation 3 where the biascharacterising class is defined as the set of MLPs that take only the bias dimension as input. This model exhibits debiased decision boundaries (Appendix §B) with improved accuracy of 89% on the debiased test data.4 Experiments
In the previous section, REBI’S has been introduced and theoretically justified. In this section, we present experimental results of REBI’S. We first introduce the setup, including the biases tackled in the experiments, difficulties inherent to the crossbias evaluation, and the implementation details (§4.1). Results on Biased MNIST (§4.2
) and ImageNet (§
4.3) are shown afterwards.4.1 Experimental setup
Which biases do we tackle?
There is a broad spectrum of bias types to be addressed under the crossbias generalisation setting. Our work is targeting the biases that arise due to the existence of shortcut cues for recognition that are sufficient for recognition in training data. In the experiments, we tackle a representative bias of such type: “local pattern” biases for image classification. Even if a CNN image classifier has wide receptive fields, empirical evidence indicates that they resort to local patterns as opposed to global shape cues (Geirhos et al., 2019).
It is difficult to define and quantify all possible local pattern biases, but it is easy to capture it through a class of CNN architectures: those with smaller receptive fields. This is precisely the setting where we benefit from REBI’S.
Evaluating crossbias generalisation is difficult.
To measure the performance of a model across realworld biases, one requires an unbiased dataset or one where the types and degrees of biases can be controlled. Unfortunately, data in real world arise with biases. To debias a frog and bird image dataset with swamp and sky backgrounds (see §2.1), either rare data samples must be collected (search for photos of a frog on sky) or one must intervene with the data generation process (throw a frog into the sky and take a photo). Either way, it is an expensive procedure (Peyre et al., 2017).
Preparing an unbiased data is feasible in some cases when the bias type is simple (e.g., collecting natural language corpus of unbiased gender pronouns, Webster et al. (2018)). However, we are addressing biases that are expressed in terms of a class of representations but perhaps are difficult to precisely express in language, such as texture bias of image classifiers (§2.3).
We thus evaluate our method along two axes: (1) Biased MNIST and (2) ImageNet. Biased MNIST contains synthetic biases (colour and texture) which we freely control in training and test data for indepth analysis of REBI’S. In particular, we can measure its performance on perfectly unbiased test data. On ImageNet, we evaluate our method against realistic biases. Due to the difficulty of defining and obtaining bias labels on real images, we use proxy ground truths for the local pattern bias to measure the crossbias generalisability. MNIST and ImageNet experiments complement each other in terms of experimental control and realism.
Implementation of Rebi’s.
We describe the specific design choices in REBI’S implementation (equation 3) in our experiments. We will open source the code and data.
To train a model that overcomes local pattern biases, we first define biased model architecture families such that they precisely and sufficiently encode biased representations: CNNs with relatively small receptive fields (RFs). The biased models in will by design learn to predict the target class of an image through only local cues. On the other hand, we define a larger search space with larger RFs for our unbiased representations.
In our work, all networks and are fully convolutional networks. and denote the final convolutional layer outputs (feature maps), on which we compute the independence measures like HSIC, CKA, and SCKA (§3.1). We perform global average pooling and a learnable linear classifier on , trained along with the outer optimisation, to compute the crossentropy loss in equation 3.
For Biased MNIST, is the LeNet (LeCun et al., 1998) architecture with RF 28. It has two 5convs^{3}^{3}3We refer to convolutional layers with kernels as conv.
, after each of which maxpooling layers of
kernels are applied, followed by three linear layers. has the same number of layers but either with all convolutional layers of kernels and without maxpooling operations, called BlindNet1, or with one 3conv and one 1conv, called BlindNet3, each with a RFs of 1 and 3, respectively. On ImageNet, we use ResNet (He et al., 2016) architecture for (either ResNet18 with RF of 435 or ResNet50 with RF of 427). is defined as BagNet (Brendel and Bethge, 2019) architectures with the same depth as ResNet’s (either BagNet18 with RF of 43 or BagNet50 with RF of 91). More implementation details are provided in §C.4.2 Biased MNIST
We first verify our model on a dataset where we have full control over the type and amount of bias during training and evaluation. We describe the dataset and present the experimental results.
4.2.1 Dataset and evaluation
We construct a new dataset called Biased MNIST designed to measure the extent to which models generalise to bias shift. We modify MNIST (LeCun et al., 1998) by introducing two types of bias – colour and texture – that highly correlate with the label during training. With alone, a CNN can achieve high accuracy without having to learn inherent signals for digit recognition , such as shape, providing little motivation for the model to learn beyond these superficial cues.
We inject colour and texture biases by adding colour or texture patterns on training image backgrounds (see Figure 2). We preselect 10 distinct colour or texture patterns for each digit . Then, for each image of digit , we assign the predefined pattern
with probability
and any other pattern (predefined for other digits) with probability . then controls the biastarget correlation in the training data: leads to complete bias and leads to an unbiased dataset. We consider two datasets: Singlebias MNIST with only colour bias and Multibias MNIST with both colour and texture biases ( in all the experiments).We evaluate the model’s generalisability to bias shift by evaluating under the following criterion:
Biased.
(an indistribution case in §2.1). Whatever bias the training set contains, it is replicated in the test set. This measures the ability of debiased models to maintain high indistribution performance while generalising to unbiased settings.
Unbiased.
. We assign biases on test images independently of the labels. Bias is no longer predictive of and a model needs to utilise actual signals to yield correct predictions.
We have additional finegrained measures on Multibias MNIST: removed colour bias (colour; texture bias remains) and removed texture bias (texture; colour bias remains) cases. Colour and texture biases are marginalised out in the test set, respectively, to factorise generalisability across different types of bias.
Singlebias MNIST  Multibias MNIST  

Model  Description  biased  unbiased  biased  colour  texture  unbiased 
Vanilla  100.  27.5  100.  45.1  85.8  29.6  
Biased  100.  10.8  100.  10.0  58.5  4.7  
Wang et al. (2019a)  97.2  14.7  95.0  31.1  91.1  18.1  
REBI’S (ours)  96.1  74.9  94.7  91.9  90.9  88.6 
4.2.2 Results
Results on Single and Multibias MNIST are shown in Table 1.
Rebi’s lets a model overcome bias.
We observe that vanilla LeNet achieves 100% accuracy under the “biased” metric (the same bias between training and test data) in Single and Multibias MNIST. This is how most machine learning tasks are evaluated, yet this does not show the extent to which the model depends on bias for prediction. When the bias cues are randomly assigned to the label at evaluation, vanilla LeNet accuracy collapses to 27.5% and 29.6% under the “unbiased” metric on Single and Multibias MNIST, respectively. The intentionally biased BlindNet models result in an even lower accuracy of 10.8% on Singlebias MNIST, close to the random chance 10%. This reveals that the seemingly highperforming model has in fact overfitted to bias and has not learned beyond this fallible strategy.
REBI’S, on the other hand, achieves robust generalisation across all settings by learning to be different from BlindNet representations . REBI’S achieves a +47.4 pp (Single) and +59.0 pp (Multi) higher performances than the vanilla model under the crossbias generalisation setup (the unbiased metric), with a slight degradation in original accuracies (3.9 pp and 5.3 pp, respectively).
Comparison against Hex.
Previously, HEX (Wang et al., 2019a) has attempted to reduce the dependency of a model on “superficial statistics”, or highfrequency textural information. HEX measures texture via neural greylevel cooccurrence matrices (NGLCM) and projects out the NGLCM feature from the output of the model of interest. We observe that , where HEX is applied on , is effective in removing texture biases (85.5% to 91.1% crosstexture accuracy), but still vulnerable to colour biases (accuracy drops from 45.1% to 31.1%). Handcrafting texture features as done by HEX has resulted in its limited applicability beyond the handcrafted bias type. By designing the model family architecture, instead of a specific feature extractor, REBI’S achieved a representation free of broader types of biases.
4.2.3 Factor analysis
Some design choices exist, leading to our final model (§3.1). We examine how the factors contribute to the final performance. See Table 2 for ablative studies on the Singlebias MNIST.



Impact of the independence criterion.
Three independence measures, HSIC, CKA, and SCKA, have been considered. SCKA, the separation CKA, used in REBI’S, shows a superior debiasing performance (74.9%) against baseline choices. It improves upon HSIC by avoiding the trivial solution (a constant function), and upon CKA via more stable optimisation due to the milder conditional independence that does not contradict the classification objective.
Impact of the objective in the inner optimisation.
We then study the effect of our choice to replace the inner SCKA optimisation with the objective. is considered, as it poses a stronger convergence condition than SCKA does. We confirm indeed that the objective results in a better debiasing performance.
Impact of updating .
The advantage of specifying a class of models instead of a single, fixed model is that can be computed more precisely (§3.1). We quantify this benefit. Fixing the biased representation to results in suboptimal debiasing performance, 44.7%. By including multiple fixed biased models , debiasing improves to 58.1%, but is not as good as the updated case, 74.9%. It is thus important to precisely compute the representationtoset independence. More detailed analysis around the receptive field sizes of models in is in Appendix §E.
4.3 ImageNet
In ImageNet experiments, we further validate the applicability of REBI’S on the local pattern bias in realistic images (i.e,, objects in natural scenes). The local pattern bias often lets a model achieve good indistribution performances by exploiting the local cue shortcuts (e.g, determining a turtle class by not seeing its shape but the background texture).
4.3.1 Dataset and evaluation
We construct 9Class ImageNet, a subset of ImageNet (Russakovsky et al., 2015) containing 9 superclasses as done in Ilyas et al. (2019), since fullscale analysis on ImageNet is not scalable. We additionally balance the ratios of subclass images for each superclass to solely focus on the effect of the local pattern bias.
Since it is difficult to evaluate crossbias generalisability on realistic unbiased data (§4.1), we settle for alternative evaluations:
Biased.
Accuracy is measured on the indistribution validation set. Though widelyused, this metric is blind to a model’s generalisability to unseen biastarget combinations.
Unbiased.
As a proxy to the perfectly debiased test data, which is difficult to collect (§4.1), we use texture clusters IDs as the ground truth labels for local pattern bias. For full details of texture clustering algorithm, see Appendix §F. For an unbiased accuracy measurement, we compute accuracies for every set of images corresponding to a targettexture combination . The combinationwise accuracy is computed by , where is the number of correctly predicted samples in and is the total number of samples in , called the population at . The unbiased accuracy is then the mean accuracy over all where the population is nonzero . This measure gives more weights on samples of unusual textureclass combinations (smaller ) that are less represented in the usual biased accuracies. Under this unbiased metric, a biased model basing its recognition on textures is likely to show suboptimal results on unusual combinations, leading to a drop in the unbiased accuracy.
ImageNetA.
ImageNetA (Hendrycks et al., 2019) contains failure cases of ImageNet pretrained ResNet50 among web images. The images consist of many failure modes of networks when “frequently appearing background elements” become erroneous cues for recognition (e.g. a bee image feeding on hummingbird feeder is recognised as a hummingbird). Improved performance on ImageNetA is an indirect signal that the model learns beyond the bias shortcuts.
4.3.2 Results
We measure performances of ResNet18 and ResNet50, each trained to be different from BagNet18 and BagNet50 respectively using REBI’S, under the metrics in the previous part. Results are shown in Table 3.
Vanilla models are biased.
Both ResNet18 and ResNet50 show good performances on the biased accuracy (93.3% and 91.7%, respectively), but dropped performances on the textureunbiased accuracies (85.8% and 78.3%, respectively). The drop signifies the biases of vanilla models towards texture cues; by basing their predictions on texture cues they obtain generally better accuracies on textureclass pairs that are represented more. The drop also shows the limitation of current evaluation schemes where crossbias generalisation is not measured.
Rebi’s leads to less biased models.
When REBI’S is applied on ResNet18 and ResNet50 to encourage them to unlearn cues learnable by BagNet18 and BagNet50, respectively, we observe general boost in unbiased accuracies. ResNet18 improves from 85.8% to 88.4%; ResNet50 from 78.3% to 89.2%. Our method thus robustly generalises to less represented texturetarget combinations at test time. We observe that our method also shows improvement on the challenging ImageNetA subset (e.g. from 8.9% to 13.6% for ResNet50), which further shows our superiority on generalisability to unusual textureclass combinations. More detailed analysis on pertexture and perclass accuracies are included in Appendix §G; learning curves for the baseline and REBI’S are in Appendix §H.
Model  Description  Biased  Unbiased  INA 

ResNet18  93.3  85.8  8.1  
REBI’S  93.7  88.4  11.5  
ResNet50  91.7  78.3  8.9  
REBI’S  88.7  89.2  13.6 
Qualitative analysis.
We qualitatively present the cases where our method successfully debiases a texturetarget dependency. Figure 3 shows examples of common and uncommon texturetarget combinations for “grass” and “closeup” texture clusters. The shown uncommon instances are the ones ResNet18 has incorrectly predicted the class. For example, crab in the grass has been predicted as turtle, presumably because turtles cooccur a lot with grass backgrounds in training data. On the other hand, REBI’S (ours) robustly generalises to unusual textureclass combinations.
5 Conclusion
We have identified a practical problem faced by many machine learning algorithms that the learned models exploit bias shortcuts to recognise the target (crossbias generalisation problem in §2). In particular, models tend to underutilise its capacity to extract nonbias signals (e.g. global shapes for object recognition) when bias shortcuts provide sufficient cues for recognition in the training data (e.g. local patterns and background cues for object recognition) (Geirhos et al., 2019). We have addressed this problem with the REBI’S framework, which does not rely on expensive, if not infeasible, training data debiasing schemes. Given an identified set of models that encodes the bias to be removed, REBI’S encourages a model to be statistically independent of (§3). We have provided theoretical justifications for the use of statistical independence in §3.2, and have validated the superiority of REBI’S in removing biases from models through experiments on modified MNIST and ImageNet (§4).
References
 Analysis of representations for domain adaptation. In Advances in neural information processing systems, pp. 137–144. Cited by: §2.1.
 Approximating CNNs with bagoflocalfeatures models works surprisingly well on imagenet. In International Conference on Learning Representations, External Links: Link Cited by: Appendix C, §1, §4.1.
 Texture synthesis using convolutional neural networks. In Advances in neural information processing systems, pp. 262–270. Cited by: Appendix F.
 ImageNettrained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness.. In International Conference on Learning Representations, External Links: Link Cited by: §1, §1, §2.2, §2.3, §4.1, §5.
 Measuring statistical dependence with hilbertschmidt norms. In International conference on algorithmic learning theory, pp. 63–77. Cited by: §3.1.
 A kernel statistical test of independence. In Advances in neural information processing systems, pp. 585–592. Cited by: §3.1.
 Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, Louisiana, pp. 107–112. External Links: Link, Document Cited by: §1.

Equality of opportunity in supervised learning
. In Advances in neural information processing systems, pp. 3315–3323. Cited by: §2.2, §3.1. 
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778. Cited by: §4.1.  Natural adversarial examples. External Links: 1907.07174 Cited by: §4.3.1.
 Adversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175. Cited by: §4.3.1.

Perceptual losses for realtime style transfer and superresolution
. In European conference on computer vision, pp. 694–711. Cited by: Appendix F.  Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Appendix C.
 Similarity of neural network representations revisited. In International Conference on Machine Learning, Cited by: §3.1, §3.1.
 Gradientbased learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.1, §4.2.1.
 REPAIR: removing representation bias by dataset resampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9572–9581. Cited by: §1, §2.2.
 RESOUND: towards action recognition without representation bias. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 513–528. Cited by: §2.2.
 Contemplating visual emotions: understanding and overcoming dataset bias. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 579–595. Cited by: §2.2.
 Weaklysupervised learning of visual relations. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5179–5188. Cited by: §2.2, §4.1.
 Discovering fair representations in the data domain. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8227–8236. Cited by: §2.2, §3.1.
 Imagenet large scale visual recognition challenge. International journal of computer vision 115 (3), pp. 211–252. Cited by: §4.3.1.
 Kernel methods for pattern analysis. Cambridge university press. Cited by: §3.1.
 Not using the car to see the sidewalk–quantifying and controlling the effects of context in classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8218–8226. Cited by: §2.2.
 Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556. Cited by: Appendix F.
 Unbiased look at dataset bias.. In CVPR, Vol. 1, pp. 7. Cited by: §1.
 Learning robust representations by projecting superficial statistics out. ICLR. Cited by: §1, §4.2.2, Table 1.
 Learning robust representations by projecting superficial statistics out. In International Conference on Learning Representations, External Links: Link Cited by: §2.2.
 Mind the gap: a balanced corpus of gendered ambiguous pronouns. Transactions of the Association for Computational Linguistics 6, pp. 605–617. Cited by: §4.1.
 Learning fair representations. In International Conference on Machine Learning, pp. 325–333. Cited by: §2.2.
 FISHmml: fisherhsic multiview metric learning.. In IJCAI, pp. 3054–3060. Cited by: §3.1.
Appendix A Statistical Independence is Equivalent to Functional Orthogonality for Linear Maps.
We provide a proof for the following lemma in §3.2.
Lemma 1.
Assume that and are affine mappings and where and . Assume further that is a normal distribution with mean and covariance matrix . Then, if and only if . For a positive semidefinite matrix , we define , and the set orthogonality likewise.
Proof.
Due to linearity and normality, the independence is equivalent to the covariance condition . Covariance is computed as:
(4) 
Note that
(5)  
Appendix B Decision Boundary Visualisation for Toy Experiment
We show the decision boundaries of the toy experiment in §3.2. See Figure 4. GIF animations of decision boundary changes over training can be found at anonymous link.
Appendix C Implementation Details
We solve the minimax problem in equation 3
through alternating stochastic gradient descents (ADAM
Kingma and Ba (2014)), where we alternate between 5 epochs for the outer problem and 5 epochs for the inner one. The regularisation parameter
is set to for CKA and for minimisation on MNIST and for ImageNet. Note that we use a large for MNIST as the degree of synthetic bias is excessively large (biastarget correlation set to ) compared to the degree of bias in realistic settings. We have used batch sizes ( for ResNet50) and learning rates with linear decay. We train all models for 80 epochs for Biased MNIST. For 9class ImageNet, we train ResNet18 for 200 epochs and ResNet50 for 100 epochs.BagNet18 and BagNet50 have 18 and 50 convolutional layers which are the same structure (Basic or Bottleneck blocks) as ResNet18 and ResNet50, respectively. The internal kernel sizes of BagNets are set to following the original paper’s philosophy Brendel and Bethge (2019).
Appendix D Performance by bias and label on Biased MNIST.
In the Biased MNIST experiments (§4.2), we have shown either biased or unbiased set statistics. In this section, we provide more detailed results where model accuracies are computed per bias class and per target label . We visualise the casewise accuracies of the baseline LeNet and the REBI’S trained in Figure 5. The diagonals in each matrix indicate the predefined biastarget pair (§4.2.1). Thus, the biased accuracies can be computed by taking the mean of diagonal entries in each matrix and the unbiased accuracies through the mean of all entries in each matrix.
The vanilla LeNet’s tendency to have higher accuracies on diagonal entries and nearzero performances on many other offdiagonal entries indicate the fact that LeNet is relying a lot on the colour (bias) cues. REBI’S successfully resolves this tendency in the vanilla model, exhibiting more uniform performances across all biastarget pairs . Note that accuracies below the main diagonal are relatively high as they are the classes that predefined patterns are assigned to at probability . Figure 6 demonstrates that our method is successful across different degrees of bias in a given training data.
Appendix E Impact of Receptive Fields of for Biased MNIST
It is conceptually important to design a the set of biased models that encode the bias
as precisely as possible. See precision and recall conditional for
in §2.3). To see if this is empirically true, we have measured the performance of REBI’S with LeNet and BlindNet, where the BlindNet receptive fields are controlled by replacing convolutional layers with convolutions, resulting in receptive fields . The receptive field indicate the case when LeNet is used as . We measure the biased and unbiased set performances on Singlebias MNIST (§4.2.1).Results are shown in Figure 7. We observe relatively stable biased set performances and decreasing unbiased set performances. The decrease in debiasing ability is attributed to the violation of the precision condition for (§2.3): receptive fields are sufficient to capture any colour bias variations, and debiasing against models of larger receptive fields will further make not see the meaningful signal (cues beyond expressible colours). In the extreme case, when LeNet is trained to be independent against itself LeNet, the debiasing performance drops significantly.
Appendix F Texture Clustering
In our ImageNet experiments (§4.3), we obtained proxy ground truths for the local pattern bias using texture clustering. We extract texture information from images by clustering the gram matrices of lowlayer feature maps as done in standard texturisation methods (Gatys et al., 2015; Johnson et al., 2016); we use feature maps from layer relu1_2 of a pretrained VGG16 (Simonyan and Zisserman, 2014). As we intend to evaluate whether a given model is biased towards local pattern cues, we only utilise features from the lower layer encoding lower layer features like edges and colours rather than highlevel semantics. Figure 8 shows that each cluster effectively captures similar local patterns across different classes. For each cluster, we visualise its top3 correlated classes. We can see that certain classes share a common texture: cat, monkey, and dog share a “face”like texture. If a certain class is biased towards a particular texture during training, a model can take the shortcut to utilise texture cues for recognising the target class, leading to a suboptimal crossbias generalisation to unusual classtexture combinations (e.g. crab on grass).
Appendix G Bias in data and models
We study the bias in the 9class ImageNet data and the models trained on them (§4.3). Figure 10 shows the statistics of texture biases in data and the model biases that result from them. To measure the dataset bias, we empirically observe correlations between texture and target classes by counting the number of samples for each textureclass pair (denoted “population” in the main paper). We observe indeed that there does exist a strong correlation between texture clusters and class labels. We say that a class has a dominant texture cluster if the largest cluster for the class contains more than half of the class samples. 6 out of 9 classes considered has the dominant texture cluster: (“Dog”, “Dog”), (“Cat”, “Face”), (“Monkey”, “Mammal”), (“Frog”, “Spotted”), (“Crab”, “Shell”) and (“Insect”, “Closeup”).
Figure 10 further shows the accuracies of the baseline ResNet18 and REBI’S to indicate the presence of bias in the models, and how REBI’S overcomes the bias despite the bias in data itself. We measure the average of accuracies in classes with dominant texture clusters (biased classes) and the average in less biased classes. We observe that ResNet18 shows higher accuracy on biased classes () than on less biased classes (), signifying its bias towards texture. On the other hand, REBI’S achieves similar accuracies (biased classes) and (unbiased classes). We stress that REBI’S overcomes the bias even if the training data itself is biased.
Appendix H Learning Curves on ImageNet
We visualise the learning curves for the baseline vanilla ResNet50 and REBI’S trained ResNet50 against BagNet50 in Figure 11. We observe that REBI’S gradually debiases a representation beyond the baseline model.
Comments
There are no comments yet.