Controlling for Unknown Confounders in Neuroimaging

by   Sebastian Pölsterl, et al.

The aim of many studies in biomedicine is to infer cause-effect relationships rather then simple associations. For instance, researcher might gather data on dietary habits, and cognitive function in an elderly population to determine which factors affect cognitive decline and to predict the effects of changes in diet on cognition. Finding answers to questions where we want to know the outcome after some other variable has been manipulated, is the subject of causal inference. Inferring causal effects from observational data is challenging and requires making untestable assumptions about the data-generating process. One essential assumption is that of no unmeasured confounder. In complex neuroimaging studies, we neither know all potential confounders nor do we have data on them. Thus, estimating a substitute confounder would be desirable. While this is infeasible in general, we will illustrate that by considering multiple causes of interest simultaneously, we can leverage the dependencies among them to identify causal effects by means of a latent factor model. In our experiments, we quantitatively evaluate the effectiveness of our approach on semi-synthetic data, where we know the true causal effects, and illustrate its use on real data on Alzheimer's disease, where it reveals important causes that otherwise would have been missed.



There are no comments yet.


page 1

page 2

page 3

page 4


The Blessings of Multiple Causes

Causal inference from observation data often assumes "strong ignorabilit...

Framework for inferring empirical causal graphs from binary data to support multidimensional poverty analysis

Poverty is one of the fundamental issues that mankind faces. Multidimens...

Inferring the size of the causal universe: features and fusion of causal attribution networks

Cause-and-effect reasoning, the attribution of effects to causes, is one...

Causal Effect Inference with Deep Latent-Variable Models

Learning individual-level causal effects from observational data, such a...

Variational Auto-Encoder Architectures that Excel at Causal Inference

Estimating causal effects from observational data (at either an individu...

A Critical Look At The Identifiability of Causal Effects with Deep Latent Variable Models

Using deep latent variable models in causal inference has attracted cons...

Adaptive Multi-Source Causal Inference

Data scarcity is a tremendous challenge in causal effect estimation. In ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Most of us already use causal language in everyday life, e.g., when saying “I will fail the exam if I do not study”. Nevertheless, we know that failing the exam is not absolutely certain, but more likely, hence we draw our conclusion merely from associations. In contrast, causal inference provides a principled statistical framework to answer questions about cause-effect relationships. In causal inference, we consider one specific individual (termed unit) and a set of actions (also called manipulations, interventions, treatments, or causes) and want to know the outcome for each action. We might want to know a specific person’s cognitive performance whether that person underwent regular cognitive training or not (Ngandu2015). This would allow us to compare the outcome under both actions to determine the individual causal effect of cognitive training on cognition. In practice, we can only observe one of the outcomes – the one that corresponds to the actual action taken – all other outcomes remain unobserved. Thus, it is generally impossible to infer causal effects from observational data alone without making untestable assumptions about the data-generating process (Pearl2000). The gold standard to resolve this issue is a randomized experiment, where each participant is randomly assigned an action to ensure that the missing outcome arises truly due to chance. In medical research, such trials are usually laborious and in some cases unethical, whereas observational data is often readily available. However, in an observational study assignment may not be truly random but due to other factors. If those factors affect the outcome that is being studied, we say there is confounding. It is important to remember that what is or is not regarded as a confounding variable is relative and depends on the goal of the study. Consider a study on the cause of Alzheimer’s disease (AD) where we find a high correlation between having gray hair and AD, which may naïvly lead us to conclude that gray hair causes AD. However, the observed correlation between gray hair and AD is only due to a person’s age, which renders the relationship between gray hair and AD confounded. Therefore, one essential assumption in causal inference is that of no unmeasured confounder. Otherwise, if unobserved confounding is present, the observed data distribution is compatible with many – potentially contradictory – causal explanations, leaving us with no way to differentiate between the true and false effect on the basis of data. In this case, the causal effect is unidentifiable (Pearl2000).

Adjusting for known confounding variables is one method that permits us to estimate causal effects in observational studies. However, analyses across 15 neuroimaging studies revealed that considerable bias remains in volume and thickness measurement after adjusting for age and gender (wachinger2019quantifying). This result suggests that additional unknown confounders do exist and hence the assumption “no unmeasured confounder” is violated. While scanner-related effects might contribute to it, it is unrealistic to assume all potential confounders are known and have been recorded. Despite the importance of this topic, there has been little prior work in neuroimaging. We believe recent theoretical advances in causal inference can be of wide interest to the neuroimaging community to answer important clinical questions.

In this paper, we focus on the problem of estimating causal effects from observational neuroimaging data in the presence of unknown confounders. Modelling unknown confounders would alleviate the previously discussed problem of knowing all confounding factors, and enable the estimation of causal effects. While this is infeasible in general, we will illustrate that by considering multiple causes of interest simultaneously, we can leverage the dependencies among them to identify causal effects. In our experiments, we quantitatively demonstrate the effectiveness of our approach on semi-synthetic data, where we know the true causal effects, and illustrate its use on real data on Alzheimer’s disease, where our analyses reveal important causes that would otherwise have been missed.

1.1 Related Work.

In contrast to our approach, all the previous works assume that all confounding variables are known and have been measured. The most common approach for confounding adjustment is regress-out. In regress-out, the original measurement (e.g. volume, thickness, or voxel intensity) is replaced by the residual of a regression model fitted to estimate the original value from the confounding variables. In (dukart2011age)

, the authors use linear regression to account for age, which has been extended to additionally account for gender in 

(Koikkalainen2012). A non-linear Gaussian process (GP) model has been proposed in (Kostro2014) to adjust for age, total intracranial volume, sex, and MRI scanner. Fortin et al. (fortin2018harmonization) proposed a linear mixed effects model to account for systematic differences across imaging sites. In (Snoek2019), a linear regression is used to regress-out the effect of total brain volume. In addition to the regress-out approach, analyses can be adjusted for confounders by computing instance weights, that are used in a downstream classification or regression model to obtain a pseudo-population that is approximately balanced with respect to the confounders (linn2016addressing; rao2017predictive)

. An instance-weighted support vector machine to adjust for confounding due to age is proposed in 

(linn2016addressing). In (rao2017predictive), weighted GP regression is proposed to account for gender and imaging site effects.

2 Methods

2.1 Causal Effect

To denote causal quantities, we will use Pearl’s do-calculus (Pearl2000)

. Given two disjoint sets of random variables

and , denotes the post-intervention distribution of induced by the intervention or cause that sets it equal to . Going back to our example on cognitive decline, if denotes whether cognitive training was performed, and whether dementia was diagnosed, then gives the proportion of individuals that would be demented under the hypothetical situation in which cognitive training was administered uniformly to the population. The central question in causal inference is that of identification: Can the post-intervention distribution

be estimated from data governed by the observed joint distribution over

, , and confounding variables ? Because we only observed for specific values of and , and not any possible configuration, we can only attempt this by building upon assumptions on the data-generating process.

Figure 1: A: Causal graph with circles being random variables and arrows causal relationships. Filled circles are observed, transparent circles are hidden. is the outcome of interest, the set of causes, the actual, unknown confounder, and the substitute confounder. B: Probabilistic principal component model. C: Bayesian probabilistic matrix factorization model.

Here, we consider a set of causes, , …, , and want to estimate the average causal effect (ACE) that a subset of causes have simultaneously on the outcome in the presence of unobserved confounding:


where and are two distinct realizations of that are being compared. We base our method on the causal diagram depicted in fig. 1. We assume that the data-generating process is faithful to the graphical model, i.e., statistical independencies in the observed data distribution imply missing causal relationships in the graph (Spirtes2000). In particular, this implies that there is a common unobserved confounder that is shared among all causes and that there is no unobserved confounder that affects a single cause. If would be observed and satisfies the back-door criterion relative to and , Pearl2000 showed that the causal effect of on is identifiable and is given by .

2.2 Estimating a Substitute Confounder

Since we do not have access to the actual confounder , we want to find a substitute that can be estimated from observed data. We assume the substitute confounding variable is shared among all causes . In the context of neuroimaging, where causes are image-derived measures such as volume, this would imply that the unknown confounder affects all brain regions and not just single regions. This assumption is plausible, because common sources of confounding such as scanner, protocol, age, and gender affect the brain as a whole and not just individual regions (Barnes2010; Stonnington2008). Based on this assumption, we can exploit the fact that the confounder induces dependence among the causes.

From fig. 1 we can observe that given a substitute , the causes become conditionally independent: . For a -dimensional substitute confounder

, the joint probability of the causes is thus


The key realization of our proposed method is that the joint probability (2), which is derived solely from the causal graph in fig. 1

, has the same form as the joint distribution of a latent factor model such as probabilistic principal component analysis (PPCA,

Tipping1999) or probabilistic matrix factorization (BPMF, Salakhutdinov2008), depicted in fig. 1. Therefore, we can utilize this connection to estimate a substitute confounder for the unobserved confounder via a latent factor model: if a latent factor model can accurately represent all causes, its latent representation is a suitable substitute confounder.

Wang2019 showed that obtaining a substitute for the unobserved confounder by a probabilistic latent factor model allows us to identify causal quantities under certain assumptions. In particular, the ACE of a subset of causes can be estimated by


Thus, we can estimate the ACE by treating the substitute confounder as if it were observed (the expectation is over the factor model’s parameters).


Let be the matrix of observed causes for instances, then the full procedure to compute (3) is as follows. First, fit a probabilistic factor model to the observed causes and check that it captures the joint distribution (2) well (see next section). If the check passes, compute the substitute confounder for . Next, fit a regression model to the observed causes and substitute confounders to estimate . Apply the fitted model to the observed data and substitute confounders with the first features set to for all instances, and compute the average prediction. This yields the first term in . Do the same for the second term, but set the first features to . The difference between both is the ACE of on .


The equality in (3) holds under the following assumptions ([)Theorem 7]Wang2019: (i) the causes assigned to any instance have no effect on the outcomes of the other instances (stable-unit-treatment-value assumption (Rubin1980)), (ii) for any subset, such that the joint probability (2) is always non-zero (positivity), and (iii) the latent factor model can estimate the substitute confounder with consistency, i.e., deterministicly, as the number of causes grows large. Note, that the latter assumption does not imply that we need to find the true confounder, just a deterministic bijective transformation of it (Wang2019).

2.3 Choosing a Latent Factor Model

 Non-causal   Oracle   BPMF   PPCA   BPMF   PPCA
1.932 1.814 1.822 1.993 0.109 -0.061
2.012 1.847 1.850 2.019 0.162 -0.007
1.977 1.818 1.856 2.039 0.120 -0.062
1.990 1.854 1.878 2.030 0.112 -0.040
1.950 1.811 1.915 2.018 0.035 -0.069
2.031 1.854 1.907 2.063 0.123 -0.032
2.020 1.852 1.907 2.045 0.113 -0.025
2.019 1.873 1.930 2.052 0.089 -0.033
2.044 1.817 1.911 2.104 0.134 -0.059
2.050 1.849 1.946 2.110 0.104 -0.060
2.080 1.849 1.957 2.113 0.123 -0.033
2.102 1.875 1.983 2.164 0.119 -0.062
2.142 1.914 2.028 2.177 0.114 -0.034
2.147 1.910 2.012 2.159 0.135 -0.013
2.163 1.899 2.034 2.191 0.129 -0.028
Table 1:

Root mean squared errors of effects estimated by logistic regression compared to the true causal effects on semi-synthetic data (

). Oracle is the error when including the true confounder, BPMF and PPCA when including a substitute confounder. BMPF is the signed difference between the ‘Non-causal’ and the BMPF column, and PPCA the same for PPCA.

The assumptions above enforce certain properties on the latent factor model. First of all, it needs to be probabilistic. Here, we explore PPCA (Tipping1999), and BPMF (Salakhutdinov2008), which are summarized in fig. 1

. The positivity assumption (ii) holds for PPCA and BPMF, because both model continuous variables as a normal distribution with the mean determined by the latent representation, and the normal distribution is non-zero everywhere. To satisfy assumption (iii), we need to ensure the factor model estimated from the observed causes captures the joint distribution (


) well. We can employ posterior predictive checking to quantify how well the factor model fits the data 

([)ch. 6]Gelman2013. The idea is that simulated data generated under the factor model should look similar to observed data. First, we hold-out a randomly selected portion of the observed causes, yielding to fit the factor model, and for model checking. Next, we draw simulated data from the joint posterior predictive distribution. Let be the vector of parameters of the factor model, including the substitute confounder , then

. If there is a systematic difference between the simulated and the held-out data, we can conclude that the latent model does not represent the causes well. We use the expected negative log-likelihood as test statistic:

From it, we can compute the Bayesian p-value , defined as the probability that the simulated data is more extreme than the observed data (Gelman2013): Wang2019 suggested to use to ensure assumption (iii) holds. We estimate by drawing repeatedly from the posterior predictive distribution for each and computing the proportion for which .

3 Experiments

3.1 Semi-synthetic Data

In our first experiment, we evaluated how well causal effects can be recovered from the data when using no confounder, the actual confounder (oracle), and substitute confounders computed by PPCA and BPMF. We used T1-weighted magnetic resonance imaging brains scans from  = 10,824 subjects from UK Biobank (miller2016multimodal). From each scan, we extracted 29 volume measurements with FreeSurfer 5.3 (Fischl2012) and created synthetic binary outcomes. For each instance, we generated one confounding variable by assigning individuals to clusters with varying percentages of positive labels. First, we obtained the first two principal components across all volumes, scaled individual scores to , and clustered the projected data into 4 clusters using -means. Each cluster was assigned a different and scale of the noise term. Causal effects follow a sparse normal distribution (), where all values in the 20–80th percentile range are zero, hence only a small portion of volumes have a non-zero causal effect. Let , , and

denote how much variance can be explained by the causal effects, confounding effects, and the noise, respectively, then for the

-th instance in cluster , we generated outcome as:

where , , and

are standard deviations with respect to

, , and for . Finally, we chose to obtain a roughly equal ratio between the positive and negative class.

Both latent factor models passed the posterior predictive check with (BPMF) and (PPCA), despite that the true data generation model differs. Table 3 shows the root mean squared error (RMSE) with respect to the true causal effect across 1,000 simulations for various configurations for , , and fixed (more results are in the supplementary materials). As expected, by ignoring confounding (first column), the RMSE increases as the contribution of the confounding effect increases (higher ). Results also show that there is a cost to using a substitute confounder: using the actual confounder (oracle) leads to a lower RMSE. Considering the improvement relative to the non-causal model, we observe a gain when using BPMF substitute confounders but not for PPCA.

3.2 Real-world Data

In our experiment on real data, we study the causal effect of different AD pathologies and ApoE on the widely used Alzheimer’s Disease Assessment Scale Cognitive Subscale 13 (ADAS) score, which assesses the severity of cognitive symptoms of dementia (Mohs1997)

. ADAS ranges between 0 and 85 with higher values indicating a higher cognitive decline. To exclude factors of impairment for reasons other than AD, we employ the ATN classification scheme (amyloid deposition, pathologic tau, and neurodegeneration), which classifies patients into one of eight distinct pathological groups 

(Jack2018). We selected patients with normal pathology (A-/T-/N-) and those with AD pathology (A+/T+/N+, A+/T+/N-). After removal of highly correlated volume measurements, we retained 19 from which we computed a substitute confounder using BPMF, as results on semi-synthetic data indicated its effectiveness. To estimate the ACE (3), we converted ADAS to proportions in the interval and used a Beta regression model for prediction (Ferrari2004). We also included known clinical markers age, gender, ApoE, and years of education for estimating causal effects. We randomly split the data into 448 subjects for training and 49 for evaluation. All data was obtained from the Alzheimer’s Disease Neuroimaging Initiative (jack2008alzheimer).

We used 2 substitute confounders and checked that BPMF passed the posterior predictive check with . The biggest change in effect size, compared to the non-causal model ignoring confounding, is the corpus callosum volume (see fig. 2

). In the non-causal model, the estimated effect size is insignificant with the 80% credible interval being almost equally distributed across both sides of the zero. After including substitute confounders, a significant negative association between corpus callosum volume and ADAS is found. This is an interesting finding, since previous studies found atrophy of corpus callosum to be a marker of the progressive interhemispheric disconnection in AD (see e.g. 

Delbeuck2003 for an overview). An analysis based on the non-causal model would have missed this cause.

Next, we estimated the ACE of different AD pathologies (see table 2). As expected, for the most extreme transition from A-/T-/N- to A+/T+/N+, the estimated causal effect is the largest, but is slightly reduced after including substitute confounders. These values match the range observed for patients diagnosed with Alzheimer’s disease (Raghavan2012). Finally, we focus on the ACE of ApoE. They are considerably lower, with the highest being homozygous Apo-3 (most common type) compared to homozygous Apo-4 (high risk type). When considering the effect from homozygous Apo-2 (low risk type) to heterozygous Apo-3/4, the effect of Apo-2 flips from being harmful to protective after inclusion of substitute confounders. Studies on the genetics of AD have found that the Apo-2 allele is indeed protective of AD, therefore only the estimated effect with substitute confounder agrees with clinical findings (Corder1994). Our results show that if causal effects are large, as in the case of ATN, the difference in estimated causal effects is minor. However, if causal effects are small, as in the case of corpus callosum volume and Apo-2, accounting for unknown confounders does reveal clinically validated effects that would have been missed with the non-causal model.

Figure 2:

Mean coefficient (log-odds ratio) and 80% credible interval of corpus callosum central volume.

Non-Causal  BPMF
A+/T+/N+ A-/T-/N- 56.485 55.830
A+/T+/N- A-/T-/N- 40.977 40.530
A+/T+/N+ A+/T+/N- 29.534 29.147
Apo-4/4  Apo-3/3 2.553 2.532
Apo-3/4  Apo-2/2 -0.518 0.062
Table 2: Average causal effect of ATN and ApoE status on ADAS.

4 Conclusion

Most neuroimaging studies are subject to various sources of confounding, but usually we neither know all of them nor do we have data on them. To alleviate this problem, we proposed a latent factor model approach to estimate causal effects from observational data in the presence of unknown confounders. We showed in experiments on semi-synthetic data that by including a substitute confounder, we can recover causal effects more accurately than a model that ignores confounding. Analyses on real data concerning Alzheimer’s disease revealed that including substitute confounders can reveal important causes of cognitive decline that otherwise would have been missed.


This research was partially supported by the Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation.Bavaria (ZD.B), and the Federal Ministry of Education and Research in the call for Computational Life Sciences (DeepMentia, 031L0200A).