Latent Adversarial Debiasing: Mitigating Collider Bias in Deep Neural Networks

11/19/2020
by   Luke Darlow, et al.
5

Collider bias is a harmful form of sample selection bias that neural networks are ill-equipped to handle. This bias manifests itself when the underlying causal signal is strongly correlated with other confounding signals due to the training data collection procedure. In the situation where the confounding signal is easy-to-learn, deep neural networks will latch onto this and the resulting model will generalise poorly to in-the-wild test scenarios. We argue herein that the cause of failure is a combination of the deep structure of neural networks and the greedy gradient-driven learning process used - one that prefers easy-to-compute signals when available. We show it is possible to mitigate against this by generating bias-decoupled training data using latent adversarial debiasing (LAD), even when the confounding signal is present in 100 examples,we can improve their generalisation in collider bias settings. Experiments show state-of-the-art performance of LAD in label-free debiasing with gains of 76.12 coloured MNIST, and 8.27

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 6

page 9

08/27/2021

Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing

Machine learning models achieve state-of-the-art performance on many sup...
04/20/2022

Epistemic Uncertainty-Weighted Loss for Visual Bias Mitigation

Deep neural networks are highly susceptible to learning biases in visual...
04/18/2018

Modeling and Simultaneously Removing Bias via Adversarial Neural Networks

In real world systems, the predictions of deployed Machine Learned model...
01/19/2022

Debiased Graph Neural Networks with Agnostic Label Selection Bias

Most existing Graph Neural Networks (GNNs) are proposed without consider...
12/12/2018

Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data

Statistical learning on biological data can be challenging due to confou...
04/26/2022

Unsupervised Learning of Unbiased Visual Representations

Deep neural networks are known for their inability to learn robust repre...
02/27/2020

ConQUR: Mitigating Delusional Bias in Deep Q-learning

Delusional bias is a fundamental source of error in approximate Q-learni...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Invariably, in real-world machine learning settings, training and test sets are different. This general phenomenon has become known as

dataset shift (Caron et al., 2018). Yet there are many causes for such shifts (Storkey, 2009). One common scenario is sample selection bias (Heckman, 1979)

where the process of curating a training dataset differs from the process by which data arrives during deployment. This issue is ubiquitous; even standard machine learning benchmarks (e.g. ImageNet) contain images selected for the clarity with which a class is represented, a clarity missing in many real applications, where for example you might see an object occluded.

One pernicious form of sample selection bias is collider bias. This is illustrated and characterised in Figure 1. Consider two variables: a causal variable that determines the target, and what we will call a confounding variable, that is not directly related to the target. In collider bias, these two variables that are, for the most part, independent in test scenario, become co-dependent in the training sample because of the restrictive way the training data is selected. Collider bias can cause a predictive algorithm to mistakenly target information from features associated with the confounding variable rather than the causal variable; such features then do not generalise to the test scenario.

The data processing inequality implies that information about the target from the causal signal must be greater than that of the confounding signal. However the confounding signal can be easier to discover than the causal signal, due to its ease of compute (a highly linear confounder, for example can be discovered before a highly non-linear causal relationship). When this is combined with the greediness of neural network learning, it can mean the causal signal is just never learnt about. It is precisely this scenario that is the topic of this paper.

We argue that collider bias can be a pervasive cause of non-robustness in deep neural networks (DNNs). Our main contribution is the demonstration of a specific approach to mitigate the situation: Latent Adversarial Debiasing (LAD), which pushes a network to recognise all sources of information for a problem by augmenting training using adversarially perturbed latent representations. In these latent representations, easy-to-learn confounding signals are decoupled from the classification targets, forcing networks to also learn information from the causal signal.

1.1 Collider bias

Consider a toy classification problem: distinguishing dog images from cat images. A collected training dataset for dogs and cats may be biased: people take pictures of dogs outside while they take them for a walk e.g. in a field, and take pictures of cats in their homes, e.g. on a sofa. Yet both cats and dogs go inside and outside.

It might be critical in the test setting to distinguish between dogs and cats each presented in indoor and outdoor settings. In this training dataset the simple feature of the background colour in the image is a strong confounder for the real signal that needs to be detected – the difference in appearance between dogs and cats.

We call the easy-to-learn bias-inducing signal the confounding signal, as opposed to the true causal signal. A model relying on the causal signal will generalise well, while a model relying on the confounding signal will generalise poorly. We propose a solution that hinges on the assumption that confounding signals are typically easier-to-compute in that their gradients during learning are stronger than the gradients of causal signals (see Section 2.1). A similar assumption was also made in related earlier work (Nam et al., 2020; Bahng et al., 2020; Bras et al., 2020; Minderer et al., 2020)

. We hypothesise that a specific form of adversarial examples can be used to augment training data such that confounding signal is decoupled from the causal signal. We show that adversarial data can be generated by gradient descent in the latent space of an autoencoder-like model to produce augmented training data. The effect is a reduced association between confounding and causal signals, requiring a model learning on this augmented data to rely on the causal signal. Our method, LAD, shows marked improvement over state-of-the-art, without relying on any presence of bias-free data (opposed to

Nam et al. (2020); Bras et al. (2020) and Bahng et al. (2020)).

2 Problem Definition

Consider an underlying data generating distribution for the data, . While for most problems we do not have access to this distribution, representative empirical samples are typically available. The bias problems we are concerned with are ones where there are multiple underlying variables, e.g.  and in Figure 1, that causally influence the observed data.

Figure 1: Graphical model for collider bias. , are latent information sources. The target is causally dependent on but not . But contains information from both latent sources. is a binary sample selection variable: implies that the sample is selected for a particular dataset.

When selecting or collecting data for a problem we inevitable introducing a sampling bias, , which can be thought of as a form of rejection sampling. E.g., all the times a user implicitly chooses not to take a photo. In this way sampling bias couples the underlying variables of the observed data in a manner that DNNs are ill-equipped to handle.

We have included the sampling mechanism, , explicitly in Figure 1. Conditioning on is what actually causes the association between underlying variables. In both cases the secondary signal does not cause the target and is therefore a confounding signal.

2.1 Confounding signals have high-gradient learning signals

Crucial to our insight into the tendency of DNNs to rely on easy-to-learn confounding signals in the training data is understanding how gradient of these signals with respect to the training loss evolves over learning. To this end we present a toy problem, called the one-pixel problem 111Private communication: Harri Edwards, 2016. in Section 2.2, and track the gradient ratio () to determine the relative reliance on causal () and confounding () variables:

(1)

where is the L2 Norm. When the learning process is not favouring either signal. When the learning process is favouring the confounding signal and when it is favouring the causal signal. As this quantity is impossible to compute when we do not have direct access to the causal variable, a toy problem enables us to assess this.

2.2 One-Pixel problem

The one-pixel problem is a bias-reliance demonstration that can be constructed simply using any image dataset for classification: all that it requires is setting the pixel of the first row of each training image to a pre-selected value (where is the class index) – see Figure 2.

(a) Training images
(b) Cross entropy loss
(c) Training accuracy
(d) L2 norm gradients
(e) Signal:bias gradient ratio, Equation 1
Figure 2: One-pixel problem on MNIST. The class information is encoded as a confounding signal in the first pixels of the first row in each image (a). We show the training loss (b), training accuracy (b), l2 norm of gradients on the informative pixel and the remaining image (c), and the ratio of said gradients (d). In this case the test accuracy is no better than random chance.

Figure 2 tracks the gradient ratio between causal signal and confounding signal, , measured as the ratio of L2 norm gradients on the image pixels (all but the pixels of the OPP) and the L2 norm gradients of the pixels encoding class information, over 1000 minibatch iterations. This is simple to compute for the one-pixel problem as there are distinct pixels associated with each signal. At initialisation the gradients are stronger over the image space (causal), but become dominated by the gradients over the -pixel space (confounding) after only 230 iterations. This demonstration serves to show that neural networks prefer easy-to-compute confounder signals and that gradient information is the mechanism that preference takes.

3 Prior Work

DNNs lack robustness to human imperceptible perturbation called adversarial examples (Goodfellow et al., 2015; Madry et al., 2018), other semantic transformations such as rotations or translations (Kanbak et al., 2018; Engstrom et al., 2019; Hendrycks and Dietterich, 2019), and more broadly to a domain shift in the input distribution (Gulrajani and Lopez-Paz, 2020).

Most relevant to our work is the observation that DNNs tend to rely on easy-to-learn biases or features that do not generalise outside the training distribution (Geirhos et al., 2020). In image classification, this is exemplified by the reliance of DNNs on the high frequency components in the image (Jo and Bengio, 2017)

. In natural language processing, DNNs were shown to latch onto statistical cues such as the present or absence of individual words 

(McCoy et al., 2019).

Adversarial learning is a set of techniques particularly useful for improving the robustness of deep neural networks (Goodfellow et al., 2015; Madry et al., 2018). Predominantly, prior work focused on applying adversarial learning to improving the robustness to small perturbations bounded in norm.

Adversarial learning was also applied to de-biasing models  (Goel et al., 2020; Qiu et al., 2019; Zhang et al., 2018; Beutel et al., 2017; Edwards and Storkey, 2016; Arjovsky et al., 2019; Stachura et al., 2020). Zhang et al. (2018); Beutel et al. (2017); Edwards and Storkey (2016)

remove the information about the bias from the input or hidden representation.

Arjovsky et al. (2019) use adversarial learning to reduce reliance of a neural network on features that change between different domains. In contrast, we do not require annotated data on the presence of a bias.

Data augmentation has also been shown to be effective in improving robustness of deep neural networks to semantically meaningful perturbations (Fawzi et al., 2016; Hendrycks et al., 2020). Our work can be seen as an automatic method for creating such augmentations.

Our work is most closely related to unsupervised methods to de-biasing models (Nam et al., 2020; Bras et al., 2020; Gowal et al., 2020; Bahng et al., 2020), but substantially differ in the assumptions made. Gowal et al. (2020) assume access to a disentangled representation with an identified factor that is not causally related to the label. Then they use a decoder to produce augmented images by mixing the spurious factor between different pairs of images. Specifically, they train StyleGAN (Karras et al., 2018), and use its first stage representation as the spurious factor. Their method is hence limited to image classification scenarios in which features identified by StyleGAN correspond to the learned biases. Bahng et al. (2020)

require providing a biased model that heavily relies on the bias information in the dataset (e.g. a CNN that has small receptive fields that bias the model towards textural information). They train a debiased classifier by regularizing its representation to be statistically independent from the representation learned by the biased model.

Similarly to Nam et al. (2020) and Bras et al. (2020), LAD hinges on a relaxed assumption that the spurious correlation is easier to learn than the true signal. Nam et al. (2020) assumes first examples that are learned are biased, and trains a second network that has an intentionally high loss on them. Bras et al. (2020) filters out examples that can be classified using a simple linear model. These approaches are inspired by the phenomenon that DNNs first prioritise learning a consistent subset of easy-to-learn examples (Swayamdipta et al., 2020; Arpit et al., 2017).

In contrast to Nam et al. (2020); Bras et al. (2020) we do not assume that a subset of examples is free of bias. Instead, we modify all examples using carefully crafted adversarial examples to reduce the reliance on confounding signals. We compare directly to Nam et al. (2020) and Bahng et al. (2020) and show markedly improved generalisation performance.

Finally, LAD is related to Minderer et al. (2020)

. Similarly to us, they reduce the reliance of training on easy to learn features by performing an adversarial walk in the latent space of an autoencoder. The key difference is that they use the method to improve self-supervised learning, in which they note the self-superivsed objective can be too easily optimized by relying on shortcut (easy to learn) features.

4 LAD: Latent Adversarial Debiasing

We propose to counter collider bias in DNNs by augmenting the training data to artificially disassociate confounding signals and causal signals. We require three components to achieve this:

  • A latent representation, , of the underlying data manifold, modelled by , such that the confounding and causal signals are approximately disentangled and accessible.

  • A biased classifier, , trained on this latent representation.

  • A method to remove confounding signals from training examples using both and .

In the rest of this section we describe each component. We note here that similar but more restrictive assumptions were made by prior works (Nam et al., 2020; Bras et al., 2020; Bahng et al., 2020; Karras et al., 2018). Perhaps most importantly, we assume relies on the easy-to-learn confounding signal when trained on . We also constrain to ensure this is the case.

4.1 Manifold Access

We assume the data distribution is conditional on a set of underlying variables such that: and (see Figure 1.) This corresponds to an underlying low-dimensional manifold that dictates the space of plausible images in the data.

We need access to (an approximation of) the latent manifold parameterised in a way that the easy-to-compute confounding signal is disentangled from the causal signal. This helps ensure that when we train a classifier , it learns to rely on the information that induces bias, and allows us to alter or remove this information.

Stutz et al. (2019)

demonstrated a means of producing on-manifold adversarial examples by training class-specific variational autoencoder generative adversarial network (VAEGAN) hybrid models

(Larsen et al., 2016) for each class in the data. Via an adversarial walk on the approximated manifold space, Stutz et al. (2019) were able to generate adversarial images with plausible deviations from the originals.

VQ-VAE: quantised latent space

To satisfy the above desiderata, we use a vector quantised-variational autoencoder (VQ-VAE)

(Van Den Oord et al., 2017). This model enables learning of a discrete (quantised) latent representation, where the number and size of the discrete codes are pre-chosen. Where Stutz et al. (2019) learned VAEGAN models for each class to constrain the changes to remain on-manifold, the quantisation constraint of VQ-VAE offers a similar effect. We use the quantisation mechanism directly in the latent adversarial walk to project gradient-based changes onto the manifold. Early experimentation with standard autoencoders and VAEs evidenced that a strong constraint in the adversarial walk was paramount.

The VQ-VAE is effectively an encoder decoder structure (See Figure 3) with a quantised latent space. Consider that the original image can be reconstructed as:

(2)

where and are the input and reconstructed images, respectively, is the VQ-VAE, (dec, enc) are the decoder and encoder components thereof, and is the latent representation for

Classification from bias

We can then attach a classifier, , to this latent space in order to approximate the decision boundary associated with the easy-to-learn confounding signal:

(3)

where is a class prediction, and train it using standard SGD to minimise the cross entropy loss. While will contain both confounding and causal signals, it is the tendency of to latch onto easy-to-learn features that enables LAD to work. In the following section we discuss how we use these two models and to traverse the latent space, , and augment the training data such that an additional classifier must rely on the causal signal.

LAD bears resemblance to Gowal et al. (2020) who assume that the lowest level representation learned by StyleGAN is not causally linked to the label. We relax this assumption in the sense that we only require our latent variable model to disentangle the underlying causal variables. We will rely on a (simple) classifier () and a gradient-based latent adversarial walk to decouple the confounding and causal signals.

Figure 3: Model setup. The VQ-VAE is the encoder decoder structure, tasked with learning quantised latent representation, . The ‘simple’ classifier, , is learned using as input-target pairs, where is the class label. The latent adversarial walk (blue circle) alters such that produces high entropy (maximally uncertain) class predictions. The altered representation, , can then be decoded into where the easy-to-learn confounding signal and causal signal are decoupled. The ultimate aim is to train an additional classifier, using this augmented data for improved generalisation in collider bias settings.

4.2 Latent adversarial walk

While a number of adversarial attack example generation algorithms exist Szegedy et al. (2013); Madry et al. (2018); Goodfellow et al. (2014), these aim at producing imperceptible changes to an image such that a target classifier makes an incorrect, yet often highly confident, prediction on that image. Consider the family of white-box adversarial attacks Madry et al. (2018) that maximise the training loss:

(4)

where is the cross entropy loss and is computed via projected gradient descent. is constrained to ensure perceptual similarity between and . This is an optimisation process on the image space: an adversarial walk

to systematically alter the image and maximise the training loss. In our case, however, we perform the adversarial walk on the latent space with the objective of maximising the entropy of the predictive probability:

(5)

where is the entropy of the class probabilities computes, the function is the quantisation mechanism of VQ-VAE with learned latent codes. We compute as the standardised partial gradient to preserve the strength of the changes for any example:

(6)

where and

are the mean and standard deviation computed over the entire gradient vector for each image, and

is a hyper-parameter dictating the step size of the walk. Standardisation helps ensure the steps taken by the gradient walk are approximately equal in length.

Algorithm 1 details the quantisation-constrained entropy-targeted adversarial walk used for LAD. The end-goal here is to produce an altered latent representation, .

Input : , , , , ,
Output : Adjusted latent representation,
// initialise to
for  to  do
       // compute prediction
       ;
       backprop to maximise ;
       compute ;
       ;
       ;
end for
Algorithm 1 Quantisation-constrained entropy-targeted adversarial walk.

4.3 Post-walk classification

Once the latent representation has been adjusted such that the simple classifier, , outputs high entropy probabilities (i.e., it is unsure), we can then decode the new representation using the VQ-VAE decoder to construct a new image:

(7)

The goal is that this new image should contain very little information related to what used to classify. Since is constrained and we assume it will latch onto the easy-to-learn confounding signals, this gives us a way of intentionally augmenting data to remove confounders. We can then use to learn an additional classifier in a standard fashion. We call this final debiased classifier .

5 Experiments

We explore three datasets of increasing difficulty to assess the relative merit of Latent Adversarial Debiasing. For comparison with earlier works, we test two variants of coloured MNIST (LeCun et al., 2010) (background and foreground) in Sections 5.2 and 5.3. Although seemingly similar, these two variants of MNIST differ in the level of entanglement between image shape and colour. For the background-coloured variant, the colour is largely independent of the image shape, while for the foreground-coloured variant, the colour always occurs with shape. Following Nam et al. (2020), we also consider the corrupted CIFAR-10 dataset (Krizhevsky et al., 2009).

Unlike earlier works, we chose to consider the circumstance where the different confounding signals are pervasive and therefore do not assume any training data is free from bias. We seek to work toward handling the fundamental issue of confounding, instead of designing a method that is able to leverage small amounts of confounder-free data. Since earlier works do not consider fully confounded training data we also compare on the settings they target, ensuring the proportion of biased data is listed consistently for earlier works.

We consider two forms of assessment. The first we call the Independent settings, denoted ‘cross-bias’ by Bahng et al. (2020), where the confounding signal is independent of the causal signal during test (e.g., colours are sampled independently at random during test for coloured MNIST datasets). The second we call Conditioned settings where the sample selection is changed so the confounding signal is held constant during the test (e.g., the original MNIST test set with black a background and white foreground). These cover two different test-scenarios where the confounding signal has no influence in the test setting.

5.1 Implementational Details

For all datasets we use a ResNet-20 (He et al., 2016) for the final

classifier and a single hidden-layer multi-layer perceptron (width 100 units) for

. For both MNIST datasets, the VQ-VAEs were trained with 20 latent codes (learned discrete quantisations) of length 64. For corrupted CIFAR-10 the VQ-VAE was trained with 2056 latent codes of length 64.

For background coloured MNIST we used 20 steps with for the adversarial walk, for foreground coloured MNIST we used 7 steps with , and for corrupted CIFAR-10 we used 4 steps with

. These were determined using a brief hyper-parameter search and cross validation. The results given in the following sections were computed on the held-out test data. We used random crops with padding size of 4 for all datasets, random affine transformations for MNIST datasets (with limits of:

, , and ), and random horizontal flips for corrupted CIFAR-10.

Dataset Method Bias ratio Accuracy (independent) Accuracy (conditioned)
BG coloured MNIST LAD 98.82 0.039 %
Vanilla
ReBias -
ReBias -
FG MNIST LAD 98.86 0.10%
Vanilla 9.90 +- 0.13
LfF -
LfF -
LfF -
Corrupted CIFAR-10 LAD 39.93 0.62%
Vanilla
LfF 59.950.16% -
LfF -
LfF -
Table 1: Test accuracy on all datasets for two test conditions: the independent case and the conditioned case. The vanilla method is simply a (c.f. Figure 1) model trained on the original data. While we include results from ReBias (Bahng et al., 2020) and LfF (Nam et al., 2020), their methods are not directly comparable because they assume the training set contains some percentage of unbiased data.

5.2 Background coloured MNIST

To produce this dataset we used the code provided 222https://github.com/clovaai/rebias by the authors of ReBias (Bahng et al., 2020) and compare to their results in Table 1. It is evident that ReBias is strongly dependant on using a small portion of unbiased data: at a bias ratio of 99%, they achieved 88.1% test accuracy on an unbiased test set but only 22.6% at a bias ratio of 99.9%. LAD achieved 98.82% on this dataset at 100% bias ratio, approaching what is achievable on standard MNIST.

The efficacy of LAD is clearly evident by these results. We also show the training data LAD reconstructions in Figure 4 (a) and (b). LAD is clearly able to augment the colour information such that it is decoupled from the classification targets.

5.3 Foreground coloured MNIST

To generate this data we followed the protocol in LfF (Nam et al., 2020). Ten colours were randomly chosen for each class. For each image an RGB colour is sampled (correlated with class for training) from : Gaussian noise is added to the selected mean colour with a standard deviation of 0.005.

Similar to background coloured MNIST, LAD exceeds the current state-of-the-art, even though we consider 100% biased training data: the closest comparison of LfF achieves a test accuracy of 63.39% at 99.5% bias while we achieve 98.86% test accuracy. Again, these results are approaching what is achievable on the standard MNIST dataset. However, a close inspection of Figure 4 (c) and (d) will evidence that the alterations owing to LAD do begin to affect the the actual digit shape – note particularly the nines in Figure 4 (d). Compare this to (b) where the altered data leaves the digit shape almost entirely unchanged. This difference is owing to the level of entanglement between bias and true signal. Foreground coloured MNIST has the bias variable (colour) overlayed on the true signal (digit shape) instead of as a static background. Nonetheless, the test accuracies are almost identical.

5.4 Corrupted CIFAR-10

Next we consider a constructed dataset where the bias and signal variables are far more entangled. Following Nam et al. (2020), we construct a variant of corrupted CIFAR-10 Corrupted CIFAR-101 in Nam et al. (2020), with corruptions: Snow, Frost, Fog, Brightness, Contrast, Spatter, Elastic, JPEG, Pixelate, Saturate. During training these correlate with each class. Using corruptions to benchmark neural network robustness is not new – Hendrycks and Dietterich (2019), and using them as a class-informative bias yields an extremely challenging dataset.

Not only are corruptions often nuanced, they are also destructive, meaning that distentangling these from the underlying image is not always possible. Blurring, elastic distortion, JPEG compression, and pixelation are all examples non-reversible corruptions. While some (like contrast or brightness adjustment) are easier to change, it is understandable that earlier work was only able to achieve 31.66% test accuracy at a high bias ratio. We were able to achieve 39.93% test accuracy at a bias ratio of 100%. While LfF can achieve 59.95% on corrupted CIFAR-10, this requires a relatively low bias ratio of 95%, once more evidencing the reliance of earlier works on bias-free training data. We also note that the reconstructions in Figure 4 (f) are blurry, highlighting the need for an improved encoder-decoder model.

(a) : BG coloured MNIST
(b) : LAD reconstruction
(c) : FG coloured MNIST
(d) : LAD reconstruction
(e) : corrupted CIFAR-10
(f) : LAD reconstruction
Figure 4: Training data examples for (a) background coloured MNIST, (c) foreground colorued MNIST, and (e) corrupted CIFAR-10, with corresponding LAD reconstructions for each dataset in (b), (d), and (f), respectively.

6 Discussion and Conclusion

DNNs tend to focus on easy-to-learn features, should those features be sufficiently informative of the target. In this paper we showed that this problematic behaviour means that DNNs are ill-equipped to handle a ubiquitous form of sample selection bias known as collider bias. The process of collecting and curating training data can often create a scenario where test data differs substantially from training data. When the training data contains a confounding signal (such as lighting conditions), DNNs will generalise poorly. We argue that it is the deep structure of neural networks, combined with the gradient-driven learning process used that amplifies their dependence on easy-to-learn confounding signals.

We presented LAD, a method to produce latent adversarial examples that specifically target the easy-to-learn confounding signals in the data manifold. Using a VQ-VAE to approximate the data manifold corresponding to causal and confounding signals, we leverage the tendency of DNNs to latch on to easy-to-learn features and define an appropriate adversarial walk on this manifold. Decoding the adjusted latent manifold back to the image space yields augmented data where confounding signals are largely mitigated against. A classifier trained on this new data generalises better. We evidenced substantial test accuracy gains of 76.12% on background coloured MNIST, 35.47% on foreground coloured MNIST, and 8.27% on corrupted CIFAR-10, even when the training data was 100% biased.

Since LAD does not require any bias-free data, we believe we are moving toward solving the broad issue that neural networks latch on to easier-to-learn features, and evidence this in the collider bias settings. While we focus on constructed datasets that demonstrate effectively the problem at hand, extending the ideas and solutions presented herein to broader notions of dataset bias and neural network robustness is a natural progression and is planned for future work.

Acknowledgements

Our work was supported in part by the EPSRC Centre for Doctoral Training in Data Science, funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016427/1) and the University of Edinburgh. The opinions expressed and arguments employed herein do not necessarily reflect the official views of these funding bodies.

References

  • M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz (2019) Invariant Risk Minimization. arXiv e-prints. Cited by: §3.
  • D. Arpit, S. Jastrzębski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio, and S. Lacoste-Julien (2017) A closer look at memorization in deep networks. In Proceedings of the International Conference on Machine Learning, Cited by: §3.
  • H. Bahng, S. Chun, S. Yun, J. Choo, and S. J. Oh (2020) Learning de-biased representations with biased representations. In Proceedings of the International Conference on Machine Learning, Cited by: §1.1, §3, §3, §4, §5.2, Table 1, §5.
  • A. Beutel, J. Chen, Z. Zhao, and E. H. Chi (2017) Data decisions and theoretical implications when adversarially learning fair representations. arXiv e-prints, pp. arXiv:1707.00075. External Links: 1707.00075 Cited by: §3.
  • R. L. Bras, S. Swayamdipta, C. Bhagavatula, R. Zellers, M. E. Peters, A. Sabharwal, and Y. Choi (2020) Adversarial filters of dataset biases. In Proceedings of the International Conference on Machine Learning, Cited by: §1.1, §3, §3, §3, §4.
  • M. Caron, P. Bojanowski, A. Joulin, and M. Douze (2018)

    Deep clustering for unsupervised learning of visual features

    .
    In

    Proceedings of the European Conference on Computer Vision

    ,
    Cited by: §1.
  • H. Edwards and A. J. Storkey (2016) Censoring representations with an adversary. In Proceedings of International Conference on Learning Representations, Cited by: §3.
  • L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry (2019) Exploring the landscape of spatial robustness. In International Conference on Machine Learning, pp. 1802–1811. Cited by: §3.
  • A. Fawzi, H. Samulowitz, D. Turaga, and P. Frossard (2016) Adaptive data augmentation for image classification. In Proceedings of IEEE International Conference on Image Processing, Cited by: §3.
  • R. Geirhos, J. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann (2020) Shortcut learning in deep neural networks. arXiv e-prints, pp. arXiv:2004.07780. External Links: 2004.07780 Cited by: §3.
  • K. Goel, A. Gu, Y. Li, and C. Ré (2020) Model patching: closing the subgroup performance gap with data augmentation. arXiv e-prints, pp. arXiv:2008.06775. External Links: 2008.06775 Cited by: §3.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §4.2.
  • I. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In Proceedings of International Conference on Learning Representations, Cited by: §3, §3.
  • S. Gowal, C. Qin, P. Huang, T. Cemgil, K. Dvijotham, T. Mann, and P. Kohli (2020) Achieving robustness in the wild via adversarial mixing with disentangled representations. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    ,
    Cited by: §3, §4.1.
  • I. Gulrajani and D. Lopez-Paz (2020) In search of lost domain generalization. arXiv e-prints, pp. arXiv:2007.01434. External Links: 2007.01434 Cited by: §3.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In Proceedings of fourteenth European Conference on Computer Vision, Cited by: §5.1.
  • J. J. Heckman (1979) Sample selection bias as a specification error. Econometrica 47, pp. 153–162. Cited by: §1.
  • D. Hendrycks and T. G. Dietterich (2019) Benchmarking neural network robustness to common corruptions and perturbations. In Proceedings of International Conference on Learning Representations, Cited by: §3, §5.4.
  • D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan (2020) AugMix: A simple data processing method to improve robustness and uncertainty. In Proceedings of the International Conference on Learning Representations, Cited by: §3.
  • J. Jo and Y. Bengio (2017) Measuring the tendency of CNNs to learn surface statistical regularities. arXiv e-prints, pp. arXiv:1711.11561. External Links: 1711.11561 Cited by: §3.
  • C. Kanbak, S. Moosavi-Dezfooli, and P. Frossard (2018) Geometric robustness of deep networks: analysis and improvement. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Cited by: §3.
  • T. Karras, S. Laine, and T. Aila (2018) A style-based generator architecture for generative adversarial networks. In Proceedings of Conference on Computer Vision and Pattern Recognition, Cited by: §3, §4.
  • A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §5.
  • A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther (2016) Autoencoding beyond pixels using a learned similarity metric. In Proceedings of International conference on machine learning, Cited by: §4.1.
  • Y. LeCun, C. Cortes, and C. Burges (2010) MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2. Cited by: §5.
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018)

    Towards Deep Learning Models Resistant to Adversarial Attacks

    .
    In Proceedings of International Conference on Learning Representations, Cited by: §3, §3, §4.2.
  • R. T. McCoy, E. Pavlick, and T. Linzen (2019)

    Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference

    .
    In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Cited by: §3.
  • M. Minderer, O. Bachem, N. Houlsby, and M. Tschannen (2020) Automatic shortcut removal for self-supervised representation learning. In Proceedings of the International Conference on Machine Learning, Cited by: §1.1, §3.
  • J. Nam, H. Cha, S. Ahn, J. Lee, and J. Shin (2020) Learning from failure: training debiased classifier from biased classifier. arXiv e-prints, pp. arXiv:2007.02561. External Links: 2007.02561 Cited by: §1.1, §3, §3, §3, §4, §5.3, §5.4, Table 1, §5.
  • H. Qiu, C. Xiao, L. Yang, X. Yan, H. Lee, and B. Li (2019) SemanticAdv: generating adversarial examples via attribute-conditional image editing. In Proceedings of European Conference on Computer Vision, Cited by: §3.
  • D. Stachura, C. Galias, and K. Zolna (2020) Leakage-robust classifier via mask-enhanced training. In

    The Student Abstrack Track of the Thirty-Fourth AAAI Conference on Artificial Intelligence

    ,
    Cited by: §3.
  • A. J. Storkey (2009) When training and test sets are different: characterizing learning transfer. Dataset Shift in Machine Learning (), pp. 3–28. External Links: Document Cited by: §1.
  • D. Stutz, M. Hein, and B. Schiele (2019) Disentangling adversarial robustness and generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.1, §4.1.
  • S. Swayamdipta, R. Schwartz, N. Lourie, Y. Wang, H. Hajishirzi, N. A. Smith, and Y. Choi (2020) Dataset cartography: mapping and diagnosing datasets with training dynamics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Cited by: §3.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §4.2.
  • A. Van Den Oord, O. Vinyals, et al. (2017) Neural discrete representation learning. In Advances in Neural Information Processing Systems, Cited by: §4.1.
  • B. H. Zhang, B. Lemoine, and M. Mitchell (2018) Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, External Links: Document Cited by: §3.

References

  • M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz (2019) Invariant Risk Minimization. arXiv e-prints. Cited by: §3.
  • D. Arpit, S. Jastrzębski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio, and S. Lacoste-Julien (2017) A closer look at memorization in deep networks. In Proceedings of the International Conference on Machine Learning, Cited by: §3.
  • H. Bahng, S. Chun, S. Yun, J. Choo, and S. J. Oh (2020) Learning de-biased representations with biased representations. In Proceedings of the International Conference on Machine Learning, Cited by: §1.1, §3, §3, §4, §5.2, Table 1, §5.
  • A. Beutel, J. Chen, Z. Zhao, and E. H. Chi (2017) Data decisions and theoretical implications when adversarially learning fair representations. arXiv e-prints, pp. arXiv:1707.00075. External Links: 1707.00075 Cited by: §3.
  • R. L. Bras, S. Swayamdipta, C. Bhagavatula, R. Zellers, M. E. Peters, A. Sabharwal, and Y. Choi (2020) Adversarial filters of dataset biases. In Proceedings of the International Conference on Machine Learning, Cited by: §1.1, §3, §3, §3, §4.
  • M. Caron, P. Bojanowski, A. Joulin, and M. Douze (2018)

    Deep clustering for unsupervised learning of visual features

    .
    In

    Proceedings of the European Conference on Computer Vision

    ,
    Cited by: §1.
  • H. Edwards and A. J. Storkey (2016) Censoring representations with an adversary. In Proceedings of International Conference on Learning Representations, Cited by: §3.
  • L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry (2019) Exploring the landscape of spatial robustness. In International Conference on Machine Learning, pp. 1802–1811. Cited by: §3.
  • A. Fawzi, H. Samulowitz, D. Turaga, and P. Frossard (2016) Adaptive data augmentation for image classification. In Proceedings of IEEE International Conference on Image Processing, Cited by: §3.
  • R. Geirhos, J. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann (2020) Shortcut learning in deep neural networks. arXiv e-prints, pp. arXiv:2004.07780. External Links: 2004.07780 Cited by: §3.
  • K. Goel, A. Gu, Y. Li, and C. Ré (2020) Model patching: closing the subgroup performance gap with data augmentation. arXiv e-prints, pp. arXiv:2008.06775. External Links: 2008.06775 Cited by: §3.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §4.2.
  • I. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In Proceedings of International Conference on Learning Representations, Cited by: §3, §3.
  • S. Gowal, C. Qin, P. Huang, T. Cemgil, K. Dvijotham, T. Mann, and P. Kohli (2020) Achieving robustness in the wild via adversarial mixing with disentangled representations. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    ,
    Cited by: §3, §4.1.
  • I. Gulrajani and D. Lopez-Paz (2020) In search of lost domain generalization. arXiv e-prints, pp. arXiv:2007.01434. External Links: 2007.01434 Cited by: §3.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In Proceedings of fourteenth European Conference on Computer Vision, Cited by: §5.1.
  • J. J. Heckman (1979) Sample selection bias as a specification error. Econometrica 47, pp. 153–162. Cited by: §1.
  • D. Hendrycks and T. G. Dietterich (2019) Benchmarking neural network robustness to common corruptions and perturbations. In Proceedings of International Conference on Learning Representations, Cited by: §3, §5.4.
  • D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan (2020) AugMix: A simple data processing method to improve robustness and uncertainty. In Proceedings of the International Conference on Learning Representations, Cited by: §3.
  • J. Jo and Y. Bengio (2017) Measuring the tendency of CNNs to learn surface statistical regularities. arXiv e-prints, pp. arXiv:1711.11561. External Links: 1711.11561 Cited by: §3.
  • C. Kanbak, S. Moosavi-Dezfooli, and P. Frossard (2018) Geometric robustness of deep networks: analysis and improvement. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Cited by: §3.
  • T. Karras, S. Laine, and T. Aila (2018) A style-based generator architecture for generative adversarial networks. In Proceedings of Conference on Computer Vision and Pattern Recognition, Cited by: §3, §4.
  • A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §5.
  • A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther (2016) Autoencoding beyond pixels using a learned similarity metric. In Proceedings of International conference on machine learning, Cited by: §4.1.
  • Y. LeCun, C. Cortes, and C. Burges (2010) MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2. Cited by: §5.
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018)

    Towards Deep Learning Models Resistant to Adversarial Attacks

    .
    In Proceedings of International Conference on Learning Representations, Cited by: §3, §3, §4.2.
  • R. T. McCoy, E. Pavlick, and T. Linzen (2019)

    Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference

    .
    In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Cited by: §3.
  • M. Minderer, O. Bachem, N. Houlsby, and M. Tschannen (2020) Automatic shortcut removal for self-supervised representation learning. In Proceedings of the International Conference on Machine Learning, Cited by: §1.1, §3.
  • J. Nam, H. Cha, S. Ahn, J. Lee, and J. Shin (2020) Learning from failure: training debiased classifier from biased classifier. arXiv e-prints, pp. arXiv:2007.02561. External Links: 2007.02561 Cited by: §1.1, §3, §3, §3, §4, §5.3, §5.4, Table 1, §5.
  • H. Qiu, C. Xiao, L. Yang, X. Yan, H. Lee, and B. Li (2019) SemanticAdv: generating adversarial examples via attribute-conditional image editing. In Proceedings of European Conference on Computer Vision, Cited by: §3.
  • D. Stachura, C. Galias, and K. Zolna (2020) Leakage-robust classifier via mask-enhanced training. In

    The Student Abstrack Track of the Thirty-Fourth AAAI Conference on Artificial Intelligence

    ,
    Cited by: §3.
  • A. J. Storkey (2009) When training and test sets are different: characterizing learning transfer. Dataset Shift in Machine Learning (), pp. 3–28. External Links: Document Cited by: §1.
  • D. Stutz, M. Hein, and B. Schiele (2019) Disentangling adversarial robustness and generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.1, §4.1.
  • S. Swayamdipta, R. Schwartz, N. Lourie, Y. Wang, H. Hajishirzi, N. A. Smith, and Y. Choi (2020) Dataset cartography: mapping and diagnosing datasets with training dynamics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Cited by: §3.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §4.2.
  • A. Van Den Oord, O. Vinyals, et al. (2017) Neural discrete representation learning. In Advances in Neural Information Processing Systems, Cited by: §4.1.
  • B. H. Zhang, B. Lemoine, and M. Mitchell (2018) Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, External Links: Document Cited by: §3.