Debiasing Concept Bottleneck Models with Instrumental Variables

07/22/2020 ∙ by Mohammad Taha Bahadori, et al. ∙ Amazon 0

Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information using the instrumental variable techniques. We also model the completeness of the concepts set. Our synthetic and real-world experiments demonstrate the success of our method in removing biases due to confounding and noise from the concepts.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Explaining the predictions of neural networks through higher level concepts

Kim et al. (2018); Ghorbani et al. (2019); Brocki and Chung (2019); Hamidi-Haines et al. (2018) enables model interpretation on data with complex manifold structure such as images. It also allows the use of domain knowledge during the explanation process. The concept-based explanation has been used for medical imaging Cai et al. (2019), breast cancer histopathology Graziani et al. (2018), cardiac MRIs Clough et al. (2019), and meteorology Sprague et al. (2019).

When the set of concepts is carefully selected, we can estimate a model in which the discriminative information flow from the feature vectors

through the concept vectors and reach the labels . To this end, we train two models for prediction of the concept vectors from the features denoted by and the labels from the predicted concept vector . This estimation process ensures that for each prediction we have the reasons for the prediction stated in terms of the predicted concept vector .

Figure 1: Spearman correlation coefficients () of the predictors of the concepts given features and labels for the 312 concepts in the test partition of the CUB-200-2011 dataset Wah et al. (2011). 112 concepts can be predicted more accurately with the features rather than the labels. Concept ids in the x-axis are sorted in the increasing order. We provide the detailed steps to obtain the figure in Section 3.2.

However, in reality, noise and confounding information (due to e.g. non-discriminative context) can influence both of the feature and concept vectors, resulting in confounded correlations between them. Figure 1 provides an evidence for noise and confounding in the CUB-200-2011 dataset Wah et al. (2011). We train two predictors for the concepts vectors based on features and labels and we compare the Spearman correlation coefficients between their predictions and the true ordinal value of the concepts. Having concepts for which is more accurate than could be due to noise, or due to hidden variables independent of the labels that spuriously correlated and , leading to undesirable explanations that include confounding or noise.

In this work, using the Concept Bottleneck Models (CBM) Koh et al. (2020); Losch et al. (2019) we demonstrate a method for removing the counfounding and noise (debiasing) the explanation with concept vectors and extend the results to Testing with Concept Activation Vectors (TCAV) Kim et al. (2018) technique. We provide a new causal prior graph to account for the confounding information and concept completeness Yeh et al. (2019). We describe the identifiability challenges in our causal prior graph and propose a two-stage estimation procedure. Our two-stage estimation technique defines and predicts debiased concepts such that the predictive information of the features maximally flow through them.

We show that using the labels as instrumental variables, we can successfully remove the impact of the confounding and noise from the predicted concept vectors. The first stage of our proposed procedure has three steps: (1) debias the concept vectors using the labels, (2) predict the debiased concept vectors using the features, and (3) use the predict concept vectors in the second step to predict the labels. In the second stage, we find the residual predictive information in the features that are not in the concepts. We validate the proposed method using a synthetic dataset and the CUB-200-2011 dataset.

2 Methodology


We follow the notation of Goodfellow et al. (2016) and denote random vectors by bold font letters and their values by bold symbols . The notation

is a probability measure on

and is the infinitesimal probability mass at . We use to denote the the prediction of given . In the graphical models, we show the observed and unobserved variables using filled and hollow circles, respectively. To avoid clutter in the equations, without loss of generality, we state the relationships with additive noise.

Problem Statement.

We assume that during the training phase, we are given triplets for data points. In addition to the regular features and labels , we are given a human interpretable concepts vector for each data point. Each element of the concept vector measures the degree of existence of the corresponding concept in the features. Thus, the concept vector typically have binary or ordinal values. Our goal is to learn to predict as a function of and use for explaining the predictions. Performing in two steps, we first learn a function and then learn another function . The prediction is the explanation for our prediction . During the test time, only the features are given and the prediction+explanation algorithm predicts both and .

2.1 A New Causal Prior Graph for CBMs

Figure 1(a) shows the ideal situation in explanation via high-level concepts. The generative model corresponding to Figure 1(a) states that for generating each feature we first randomly draw the label . Given the label, we draw the concepts . Given the concepts, we draw the features. The hierarchy in this graph is from nodes with less detailed information (labels) to more detailed ones (features, images).

This model in Figure 1(a) is an explanation for the phenomenon in Figure 1, because the noise in generation of the concepts allows the edge to be stronger than the edge. However, another (non-mutually exclusive) explanation for this phenomenon is the existence of hidden confounders shown in Figure 1(b). In this graphical model, represents the confounders and represents the unconfounded concepts. Note that we assume that the confounders and labels are independent when and are not observed.

Another phenomenon captured in Figure 1(b) is the lack of concept completeness Yeh et al. (2019). It describes the situation when the features, compared to the concepts, have additional predictive information about the labels.

The non-linear structural equations corresponding to the causal prior graph in Figure 1(b) are as follows


for some vector functions , and . We have and . Our definition of in Eq. (2) does not restrict , because we simply attribute the difference between and to a function of the latent confounder and noise.

Our causal prior graph in Figure 1(b) corresponds to a generative process in which to generate an observed triplet we first draw a label and a confounder vector independently. Then we draw the discriminative concepts based on the label and generate the features jointly based on the concepts, label, and the confounder. Finally, we draw the observed concept vector based on the drawn concept and confounder vectors.

Both causal graphs reflect our assumption that the direction of causality is from the labels to concepts and then to the features, , to ensure that and are marginally independent in Figure 1(b). This direction also correspond to moving from more abstract class labels to concepts to detailed features. During estimation, we fit the functions in the direction, because finding the statistical strength of an edge does not depend on its direction.

Estimation of the model in Figure 1(b) is challenging. Because of the structure of the latent confounders, this model is unidentifiable (Pearl, 2009, Chapter 3). Our solution is to first ignore the edge and estimate the , then estimate the residuals of the regression using the edge. Our two-stage estimation technique ensures that the predictive information of the features maximally flow through the concepts. In the next sections, we focus on the first stage and using the instrumental variables to eliminate the noise and confounding in estimation of the link.

(a) The ideal concepts
(b) Our more realistic graph
(c) The graph with
Figure 2: (fig:ideal) The ideal view of the causal relationships between the features , concepts , and labels . (fig:real) In a more realistic setting, the unobserved confounding variable impacts both and . The discriminative information reaches through the discriminative part of the concepts . We also model the completeness of the concepts via a direct edge from the features to the labels . (fig:dhat) When we use in place of and , we eliminate the confounding link .

2.2 Instrumental Variables

Background on Instrumental Variables.

In causal inference, instrumental variables Stock (2015); Pearl (2009) denoted by are commonly used to find the causal impact of a variable on when and are jointly influenced by an unobserved confounder (i.e., ). The key requirement is that should be correlated with but independent of the confounding variable (i.e. and ). The commonly used 2-stage least squares first regresses in terms of to obtain followed by regression of in terms of . Because of independence between and , is also independent of . Thus, in the second regression the confounding impact of is eliminated. Our goal is to use the instrumental variable trick again to remove the confounding factors impacting features and concept vectors.

Instrumental Variables for CBMs.

In our causal graph in Figure 1(b), the label is a valid instrument for the study of the relationship between concepts and features . We predict as a function of and use it in place of the concepts in the concept bottleneck models. The graphical model corresponding to this procedure is shown in Figure 1(c), where the link is eliminated. In particular, given the independence relationship , we have . This is the basis for our debiasing method in the next section.

2.3 The Estimation Method.

Our estimation uses the observation that in graph 1(b) the label vector is a valid instrument for removing the correlations due to . Combining Eqs. (1) and (2) we have . Taking expectation with respect to , we have


The last step is because both and are independent of . Thus, two term is constant in terms of and and can be eliminated after estimation. Eq. (4) allows us to remove the impact of and and estimate the denoised and debiased . We find using a neural network trained on pairs and use them as pseudo-observations in place of . Given our debiased prediction for the discriminative concepts , we can perform the CBMs’ two-steps of and estimation.

Because we use expected values of in place of during the learning process (i.e., ), the debiased concept vectors have values within the ranges of original concept vectors . Thus, we do not lose the human readability with the debiased concept vectors.

Modeling Uncertainty in Prediction of Concepts.

Our empirical observations show that prediction of the concepts from the features can be highly uncertain. Hence, we present a CBM estimator that takes into account the uncertainties in prediction of the concepts. We take the conditional expectation of the labels given features as follows


where is the probability function, parameterized by , that captures the uncertainty in prediction of labels from features. The function predicts labels from the debiased concepts.

In summary, we perform the following steps to estimate Eq. (5):

  1. Train a neural network using pairs.

  2. Train a neural network as an estimator for using pairs .

  3. Use pairs to estimate function by fitting to .

  4. [Optional] Fit a neural network to the residuals of step 3. The function captures the residual information in . Compare the improvement in prediction accuracy over the accuracy in step 3 to quantify the degree of concept incompleteness.

Steps 1–3 describe the first stage of estimating the and step 4 describe the second stage of estimating the residual link . In step 3, we approximate the integral using Monte Carlo approach by drawing from the distribution estimated in step 2. Because we first predict the labels using the concepts and then fit the to the residuals, we ensure that the predictive information maximally go through the debiased concepts. The last step is optional, because our goal is to compare the predictive power of the features going through the concepts (step 3) with the unrestricted features (step 4). We can omit step 4 and learn an unrestricted predictive model and use it for comparison.

A Special Case and Application to TCAV.

Choosing a simple multivariate Gaussian distribution

, , we can show that the above steps are simplified as follows:

  1. Learn by predicting .

  2. Learn by predicting .

  3. Learn by predicting .

  4. [Optional] Learn to predict the residues .

The above special case suggests us a simple method for debiasing the results of TCAV Kim et al. (2018) analysis. The TCAV method is attractive, because unlike CBMs, it analyzes the existing neural networks and does not need to define a new model. We can use the first step to remove the bias due to the confounding and perform TCAV among vectors, instead of vectors.

Prior Work on Causal Concept-Based Explanation.

Among the existing works on causal concept-based explanation, Goyal et al. (2019) proposes a different causal prior graph to model the spurious correlations among the concepts and remove them using conditional variational auto-encoders. In contrast, we aim at handling noise and spurious correlations between the features and concepts using the labels as instruments. Which work is more appropriate for a problem depending on the assumptions underlying that problem.

3 Experiments

Figure 3: Correlation between the estimated concept vectors and the true discriminative concept vectors as the number of data points grow.

3.1 Synthetic Data Experiments

We create a synthetic dataset according to the following steps:

  1. Generate vectors

    with elements distributed according to unit normal distribution


  2. Generate vectors with elements distributed according to unit normal distribution .

  3. Generate vectors with elements distributed according to scaled normal distribution .

  4. Generate vectors with elements distributed according to scaled normal distribution .

  5. Generate matrices with elements distributed according to scaled normal distribution .

  6. Compute for .

  7. Compute for .

  8. Compute for .

In Figure 3, we plot the correlation between the true unconfounded and noiseless concepts and the estimated concept vectors with the regular two-step procedure (without debiasing) and our debiasing method, as a function of sample size . The results show that the bias due to confounding does not vanish as we increase the sample size and our debiasing technique can make the results closer to the true discriminative concepts.

3.2 CUB Data Experiments

Figure 4: Twelve example images where the debiasing using instrumental variables helps. A common pattern is that, the image context has either prevented or misled the annotator from accurate annotation of the concepts. From the left to right, the birds are ‘Brandt Cormorant’, ‘Pelagic Cormorant’, ‘Fish Crow’, ‘Fish Crow’, ‘Fish Crow’, ‘Ivory Gull’, ‘Ivory Gull’, ‘Green Violetear’, ‘Green Violetear’, ‘Cape Glossy Starling’, ‘Northern Waterthrush’, ‘Northern Waterthrush’.

Dataset and preprocessing.

We evaluate the performance of the proposed approach on the CUB-200-2011 dataset Wah et al. (2011)

. The dataset includes 11788 pictures (in 5994/5794 train/test partitions) of 200 different types of birds, annotated both for the bird type and 312 different concepts about each picture. The concept annotations are binary, whether the concept exists or not. However, for each statement, a four-level certainty score has been also assigned: 1: not visible, 2: guessing, 3: probably, and 4: definitely. We combine the binary annotation and the certainty score to create a 7-level ordinal variable as the annotation for each image as summarized in Table

1. For simplicity, we map the 7-level ordinal values to uniformly spaced valued in the interval. We randomly choose 15% of the training set and hold out as the validation set.

Annotation Certainty Ordinal Score Numeric Map
Doesn’t Exist definitely 0 0
Doesn’t Exist probably 1
Doesn’t Exist guessing 2
Doesn’t Exist not visible 3
Exists not visible 3
Exists guessing 4
Exists probably 5
Exists definitely 6
Table 1: Mapping the concept annotations to real values.

The result in Figure 1.

To compare the association strength between and with the association strength between and we train two predictors of concepts and

. We use PyTorch’s pre-trained ResNet152 network

He et al. (2016)

for prediction of the concepts from the images. Because the annotations are ordinal numbers, we use the Spearman correlation to find the association strengths. Because

is a categorical variable,

is simply the average concept annotation scores per each class. The concept ids in the x-axis are sorted in terms of increasing values of .

The top ten concepts with the largest values of are ‘has back color::green’, ‘has upper tail color::green’, ‘has upper tail color::orange’, ‘has upper tail color::pink’, ‘has back color::rufous’, ‘has upper tail color::purple’, ‘has back color::pink’, ‘has upper tail color::iridescent’, ‘has back color::purple’, ‘has back color::iridescent’. These concepts are all related to color and can be easily confounded by the context of the images.

Training details for Eq. (5).

We model the distribution of concept logits as independent Gaussians with their means equal to the ResNet152 logit outputs. We estimate the variance for each dimension by using the logits of the true annotation scores that are clamped into

to avoid large logit numbers. In each iteration of the training algorithm, we draw 25 samples from the . Predictor of labels from concepts (the function in Eq. (5

)) is a three-layer feed-forward neural network with hidden layer sizes (312, 312, 200). There is a skip connection from the input to the penultimate layer. We model the residual function

with another pretrained ResNet152 function. All algorithms are trained with Adam optimization algorithm Kingma and Ba (2014).

Quantitative experiments.

Comparing to the baseline algorithm, our debiasing technique increases the average Spearman correlation between and from 0.406 to 0.508. For the above 10 concepts, our algorithm increases the average Spearman correlation from 0.283 to 0.389. Our debiasing algorithm also improves the generalization in prediction of the image labels. It improves the top-5 accuracy of predicting the images from 39.5% to 49.3%.

Analysis of the results.

In Figure 4, we show 12 images for which the and are significantly different. A common pattern among the examples is that the context of the image does not allow accurate annotations by the annotators. In images 3, 4, 5, 6, 7, 11, and 12 in Figure 4 the ten color-related concepts listed above are all set to 0.5, indicating that the annotators have failed in annotation. However, our algorithm correctly identifies that for example Ivory Gulls do not have green-colored backs by predicting which is closer to than the true .

Another pattern is the impact of the color of the environment on the accuracy of the annotations. For example, the second image from the left is an image of Pelagic cormorant, whose back and upper tail colors are unlikely to be green with per-class average of and , respectively. However, because of the color of the image and the reflections, the annotator has assigned to both of ‘has back color::green’ and ‘has upper tail color::green’ concepts. Our algorithm predicts and for these two features respectively, which are closer to the per-class average.

4 Conclusions and Future Works

Studying the concept-based explanation techniques, we provided evidences for potential existence of an unobserved latent variable, independent of the labels, that creates associations between the features and concepts. We proposed a new causal prior graph that models the impact of the noise and latent confounding fron the estimated concepts. We showed that using the labels as instruments, we can remove the impact of the context from the explanations. Our experiments showed that our debiasing technique not only improves the quality of the explanations, but also improve the accuracy of predicting labels through the concepts. As future work, we will investigate other instrumental variable techniques to find the most accurate debiasing method.


  • L. Brocki and N. C. Chung (2019) Concept saliency maps to visualize relevant features in deep generative models. arXiv:1910.13140. Cited by: §1.
  • C. J. Cai, E. Reif, N. Hegde, J. Hipp, B. Kim, D. Smilkov, M. Wattenberg, F. Viegas, G. S. Corrado, M. C. Stumpe, et al. (2019) Human-centered tools for coping with imperfect algorithms during medical decision-making. In CHI, Cited by: §1.
  • J. R. Clough, I. Oksuz, E. Puyol-Antón, B. Ruijsink, A. P. King, and J. A. Schnabel (2019) Global and local interpretability for cardiac mri classification. In MICCAI, Cited by: §1.
  • A. Ghorbani, J. Wexler, J. Y. Zou, and B. Kim (2019) Towards automatic concept-based explanations. In NeurIPS, pp. 9273–9282. Cited by: §1.
  • I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT press. Cited by: §2.
  • Y. Goyal, U. Shalit, and B. Kim (2019)

    Explaining classifiers with causal concept effect (cace)

    arXiv:1907.07165. Cited by: §2.3.
  • M. Graziani, V. Andrearczyk, and H. Müller (2018) Regression concept vectors for bidirectional explanations in histopathology. In

    Understanding and Interpreting Machine Learning in Medical Image Computing Applications

    pp. 124–132. Cited by: §1.
  • M. Hamidi-Haines, Z. Qi, A. Fern, F. Li, and P. Tadepalli (2018) Interactive naming for explaining deep neural networks: a formative study. arXiv:1812.07150. Cited by: §1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, Cited by: §3.2.
  • B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, et al. (2018) Interpretability beyond feature attribution: quantitative testing with concept activation vectors (tcav). In ICML, pp. 2668–2677. Cited by: §1, §1, §2.3.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.2.
  • P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang (2020) Concept Bottleneck Models. In ICML, Cited by: §1.
  • M. Losch, M. Fritz, and B. Schiele (2019) Interpretability beyond classification output: semantic bottleneck networks. arXiv:1907.10882. Cited by: §1.
  • J. Pearl (2009) Causality. Cambridge university press. Cited by: §2.1, §2.2.
  • C. Sprague, E. B. Wendoloski, and I. Guch (2019) Interpretable ai for deep learning- based meteorological applications. In American Meteorological Society Annual Meeting, Cited by: §1.
  • J. H. Stock (2015) Instrumental variables in statistics and econometrics. International Encyclopedia of the Social & Behavioral Sciences. Cited by: §2.2.
  • C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie (2011) The Caltech-UCSD Birds-200-2011 Dataset. Technical report Technical Report CNS-TR-2011-001, California Institute of Technology. Cited by: Figure 1, §1, §3.2.
  • C. Yeh, B. Kim, S. O. Arik, C. Li, P. Ravikumar, and T. Pfister (2019) On concept-based explanations in deep neural networks. arXiv:1910.07969. Cited by: §1, §2.1.