Most of us already use causal language in everyday life, e.g., when saying “I will fail the exam if I do not study”. Nevertheless, we know that failing the exam is not absolutely certain, but more likely, hence we draw our conclusion merely from associations. In contrast, causal inference provides a principled statistical framework to answer questions about cause-effect relationships. In causal inference, we consider one specific individual (termed unit) and a set of actions (also called manipulations, interventions, treatments, or causes) and want to know the outcome for each action. We might want to know a specific person’s cognitive performance whether that person underwent regular cognitive training or not (Ngandu2015). This would allow us to compare the outcome under both actions to determine the individual causal effect of cognitive training on cognition. In practice, we can only observe one of the outcomes – the one that corresponds to the actual action taken – all other outcomes remain unobserved. Thus, it is generally impossible to infer causal effects from observational data alone without making untestable assumptions about the data-generating process (Pearl2000). The gold standard to resolve this issue is a randomized experiment, where each participant is randomly assigned an action to ensure that the missing outcome arises truly due to chance. In medical research, such trials are usually laborious and in some cases unethical, whereas observational data is often readily available. However, in an observational study assignment may not be truly random but due to other factors. If those factors affect the outcome that is being studied, we say there is confounding. It is important to remember that what is or is not regarded as a confounding variable is relative and depends on the goal of the study. Consider a study on the cause of Alzheimer’s disease (AD) where we find a high correlation between having gray hair and AD, which may naïvly lead us to conclude that gray hair causes AD. However, the observed correlation between gray hair and AD is only due to a person’s age, which renders the relationship between gray hair and AD confounded. Therefore, one essential assumption in causal inference is that of no unmeasured confounder. Otherwise, if unobserved confounding is present, the observed data distribution is compatible with many – potentially contradictory – causal explanations, leaving us with no way to differentiate between the true and false effect on the basis of data. In this case, the causal effect is unidentifiable (Pearl2000).
Adjusting for known confounding variables is one method that permits us to estimate causal effects in observational studies. However, analyses across 15 neuroimaging studies revealed that considerable bias remains in volume and thickness measurement after adjusting for age and gender (wachinger2019quantifying). This result suggests that additional unknown confounders do exist and hence the assumption “no unmeasured confounder” is violated. While scanner-related effects might contribute to it, it is unrealistic to assume all potential confounders are known and have been recorded. Despite the importance of this topic, there has been little prior work in neuroimaging. We believe recent theoretical advances in causal inference can be of wide interest to the neuroimaging community to answer important clinical questions.
In this paper, we focus on the problem of estimating causal effects from observational neuroimaging data in the presence of unknown confounders. Modelling unknown confounders would alleviate the previously discussed problem of knowing all confounding factors, and enable the estimation of causal effects. While this is infeasible in general, we will illustrate that by considering multiple causes of interest simultaneously, we can leverage the dependencies among them to identify causal effects. In our experiments, we quantitatively demonstrate the effectiveness of our approach on semi-synthetic data, where we know the true causal effects, and illustrate its use on real data on Alzheimer’s disease, where our analyses reveal important causes that would otherwise have been missed.
1.1 Related Work.
In contrast to our approach, all the previous works assume that all confounding variables are known and have been measured. The most common approach for confounding adjustment is regress-out. In regress-out, the original measurement (e.g. volume, thickness, or voxel intensity) is replaced by the residual of a regression model fitted to estimate the original value from the confounding variables. In (dukart2011age)
, the authors use linear regression to account for age, which has been extended to additionally account for gender in(Koikkalainen2012). A non-linear Gaussian process (GP) model has been proposed in (Kostro2014) to adjust for age, total intracranial volume, sex, and MRI scanner. Fortin et al. (fortin2018harmonization) proposed a linear mixed effects model to account for systematic differences across imaging sites. In (Snoek2019), a linear regression is used to regress-out the effect of total brain volume. In addition to the regress-out approach, analyses can be adjusted for confounders by computing instance weights, that are used in a downstream classification or regression model to obtain a pseudo-population that is approximately balanced with respect to the confounders (linn2016addressing; rao2017predictive)
. An instance-weighted support vector machine to adjust for confounding due to age is proposed in(linn2016addressing). In (rao2017predictive), weighted GP regression is proposed to account for gender and imaging site effects.
2.1 Causal Effect
To denote causal quantities, we will use Pearl’s do-calculus (Pearl2000)
. Given two disjoint sets of random variablesand , denotes the post-intervention distribution of induced by the intervention or cause that sets it equal to . Going back to our example on cognitive decline, if denotes whether cognitive training was performed, and whether dementia was diagnosed, then gives the proportion of individuals that would be demented under the hypothetical situation in which cognitive training was administered uniformly to the population. The central question in causal inference is that of identification: Can the post-intervention distribution
be estimated from data governed by the observed joint distribution over, , and confounding variables ? Because we only observed for specific values of and , and not any possible configuration, we can only attempt this by building upon assumptions on the data-generating process.
Here, we consider a set of causes, , …, , and want to estimate the average causal effect (ACE) that a subset of causes have simultaneously on the outcome in the presence of unobserved confounding:
where and are two distinct realizations of that are being compared. We base our method on the causal diagram depicted in fig. 1. We assume that the data-generating process is faithful to the graphical model, i.e., statistical independencies in the observed data distribution imply missing causal relationships in the graph (Spirtes2000). In particular, this implies that there is a common unobserved confounder that is shared among all causes and that there is no unobserved confounder that affects a single cause. If would be observed and satisfies the back-door criterion relative to and , Pearl2000 showed that the causal effect of on is identifiable and is given by .
2.2 Estimating a Substitute Confounder
Since we do not have access to the actual confounder , we want to find a substitute that can be estimated from observed data. We assume the substitute confounding variable is shared among all causes . In the context of neuroimaging, where causes are image-derived measures such as volume, this would imply that the unknown confounder affects all brain regions and not just single regions. This assumption is plausible, because common sources of confounding such as scanner, protocol, age, and gender affect the brain as a whole and not just individual regions (Barnes2010; Stonnington2008). Based on this assumption, we can exploit the fact that the confounder induces dependence among the causes.
From fig. 1 we can observe that given a substitute , the causes become conditionally independent: . For a -dimensional substitute confounder
, the joint probability of the causes is thus
, has the same form as the joint distribution of a latent factor model such as probabilistic principal component analysis (PPCA,Tipping1999) or probabilistic matrix factorization (BPMF, Salakhutdinov2008), depicted in fig. 1. Therefore, we can utilize this connection to estimate a substitute confounder for the unobserved confounder via a latent factor model: if a latent factor model can accurately represent all causes, its latent representation is a suitable substitute confounder.
Wang2019 showed that obtaining a substitute for the unobserved confounder by a probabilistic latent factor model allows us to identify causal quantities under certain assumptions. In particular, the ACE of a subset of causes can be estimated by
Thus, we can estimate the ACE by treating the substitute confounder as if it were observed (the expectation is over the factor model’s parameters).
Let be the matrix of observed causes for instances, then the full procedure to compute (3) is as follows. First, fit a probabilistic factor model to the observed causes and check that it captures the joint distribution (2) well (see next section). If the check passes, compute the substitute confounder for . Next, fit a regression model to the observed causes and substitute confounders to estimate . Apply the fitted model to the observed data and substitute confounders with the first features set to for all instances, and compute the average prediction. This yields the first term in . Do the same for the second term, but set the first features to . The difference between both is the ACE of on .
The equality in (3) holds under the following assumptions ([)Theorem 7]Wang2019: (i) the causes assigned to any instance have no effect on the outcomes of the other instances (stable-unit-treatment-value assumption (Rubin1980)), (ii) for any subset, such that the joint probability (2) is always non-zero (positivity), and (iii) the latent factor model can estimate the substitute confounder with consistency, i.e., deterministicly, as the number of causes grows large. Note, that the latter assumption does not imply that we need to find the true confounder, just a deterministic bijective transformation of it (Wang2019).
2.3 Choosing a Latent Factor Model
Root mean squared errors of effects estimated by logistic regression compared to the true causal effects on semi-synthetic data (). Oracle is the error when including the true confounder, BPMF and PPCA when including a substitute confounder. BMPF is the signed difference between the ‘Non-causal’ and the BMPF column, and PPCA the same for PPCA.
The assumptions above enforce certain properties on the latent factor model. First of all, it needs to be probabilistic. Here, we explore PPCA (Tipping1999), and BPMF (Salakhutdinov2008), which are summarized in fig. 1
. The positivity assumption (ii) holds for PPCA and BPMF, because both model continuous variables as a normal distribution with the mean determined by the latent representation, and the normal distribution is non-zero everywhere. To satisfy assumption (iii), we need to ensure the factor model estimated from the observed causes captures the joint distribution (2
) well. We can employ posterior predictive checking to quantify how well the factor model fits the data([)ch. 6]Gelman2013. The idea is that simulated data generated under the factor model should look similar to observed data. First, we hold-out a randomly selected portion of the observed causes, yielding to fit the factor model, and for model checking. Next, we draw simulated data from the joint posterior predictive distribution. Let be the vector of parameters of the factor model, including the substitute confounder , then
. If there is a systematic difference between the simulated and the held-out data, we can conclude that the latent model does not represent the causes well. We use the expected negative log-likelihood as test statistic:From it, we can compute the Bayesian p-value , defined as the probability that the simulated data is more extreme than the observed data (Gelman2013): Wang2019 suggested to use to ensure assumption (iii) holds. We estimate by drawing repeatedly from the posterior predictive distribution for each and computing the proportion for which .
3.1 Semi-synthetic Data
In our first experiment, we evaluated how well causal effects can be recovered from the data when using no confounder, the actual confounder (oracle), and substitute confounders computed by PPCA and BPMF. We used T1-weighted magnetic resonance imaging brains scans from = 10,824 subjects from UK Biobank (miller2016multimodal). From each scan, we extracted 29 volume measurements with FreeSurfer 5.3 (Fischl2012) and created synthetic binary outcomes. For each instance, we generated one confounding variable by assigning individuals to clusters with varying percentages of positive labels. First, we obtained the first two principal components across all volumes, scaled individual scores to , and clustered the projected data into 4 clusters using -means. Each cluster was assigned a different and scale of the noise term. Causal effects follow a sparse normal distribution (), where all values in the 20–80th percentile range are zero, hence only a small portion of volumes have a non-zero causal effect. Let , , and
denote how much variance can be explained by the causal effects, confounding effects, and the noise, respectively, then for the-th instance in cluster , we generated outcome as:
where , , and
are standard deviations with respect to, , and for . Finally, we chose to obtain a roughly equal ratio between the positive and negative class.
Both latent factor models passed the posterior predictive check with (BPMF) and (PPCA), despite that the true data generation model differs. Table 3 shows the root mean squared error (RMSE) with respect to the true causal effect across 1,000 simulations for various configurations for , , and fixed (more results are in the supplementary materials). As expected, by ignoring confounding (first column), the RMSE increases as the contribution of the confounding effect increases (higher ). Results also show that there is a cost to using a substitute confounder: using the actual confounder (oracle) leads to a lower RMSE. Considering the improvement relative to the non-causal model, we observe a gain when using BPMF substitute confounders but not for PPCA.
3.2 Real-world Data
In our experiment on real data, we study the causal effect of different AD pathologies and ApoE on the widely used Alzheimer’s Disease Assessment Scale Cognitive Subscale 13 (ADAS) score, which assesses the severity of cognitive symptoms of dementia (Mohs1997)
. ADAS ranges between 0 and 85 with higher values indicating a higher cognitive decline. To exclude factors of impairment for reasons other than AD, we employ the ATN classification scheme (amyloid deposition, pathologic tau, and neurodegeneration), which classifies patients into one of eight distinct pathological groups(Jack2018). We selected patients with normal pathology (A-/T-/N-) and those with AD pathology (A+/T+/N+, A+/T+/N-). After removal of highly correlated volume measurements, we retained 19 from which we computed a substitute confounder using BPMF, as results on semi-synthetic data indicated its effectiveness. To estimate the ACE (3), we converted ADAS to proportions in the interval and used a Beta regression model for prediction (Ferrari2004). We also included known clinical markers age, gender, ApoE, and years of education for estimating causal effects. We randomly split the data into 448 subjects for training and 49 for evaluation. All data was obtained from the Alzheimer’s Disease Neuroimaging Initiative (jack2008alzheimer).
We used 2 substitute confounders and checked that BPMF passed the posterior predictive check with . The biggest change in effect size, compared to the non-causal model ignoring confounding, is the corpus callosum volume (see fig. 2
). In the non-causal model, the estimated effect size is insignificant with the 80% credible interval being almost equally distributed across both sides of the zero. After including substitute confounders, a significant negative association between corpus callosum volume and ADAS is found. This is an interesting finding, since previous studies found atrophy of corpus callosum to be a marker of the progressive interhemispheric disconnection in AD (see e.g.Delbeuck2003 for an overview). An analysis based on the non-causal model would have missed this cause.
Next, we estimated the ACE of different AD pathologies (see table 2). As expected, for the most extreme transition from A-/T-/N- to A+/T+/N+, the estimated causal effect is the largest, but is slightly reduced after including substitute confounders. These values match the range observed for patients diagnosed with Alzheimer’s disease (Raghavan2012). Finally, we focus on the ACE of ApoE. They are considerably lower, with the highest being homozygous Apo-3 (most common type) compared to homozygous Apo-4 (high risk type). When considering the effect from homozygous Apo-2 (low risk type) to heterozygous Apo-3/4, the effect of Apo-2 flips from being harmful to protective after inclusion of substitute confounders. Studies on the genetics of AD have found that the Apo-2 allele is indeed protective of AD, therefore only the estimated effect with substitute confounder agrees with clinical findings (Corder1994). Our results show that if causal effects are large, as in the case of ATN, the difference in estimated causal effects is minor. However, if causal effects are small, as in the case of corpus callosum volume and Apo-2, accounting for unknown confounders does reveal clinically validated effects that would have been missed with the non-causal model.
Most neuroimaging studies are subject to various sources of confounding, but usually we neither know all of them nor do we have data on them. To alleviate this problem, we proposed a latent factor model approach to estimate causal effects from observational data in the presence of unknown confounders. We showed in experiments on semi-synthetic data that by including a substitute confounder, we can recover causal effects more accurately than a model that ignores confounding. Analyses on real data concerning Alzheimer’s disease revealed that including substitute confounders can reveal important causes of cognitive decline that otherwise would have been missed.
This research was partially supported by the Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation.Bavaria (ZD.B), and the Federal Ministry of Education and Research in the call for Computational Life Sciences (DeepMentia, 031L0200A).