Estimation of causal effects with small data under implicit functional constraints

03/06/2020 ∙ by Jouni Helske, et al. ∙ Jyväskylän yliopisto 0

We consider the problem of estimating causal effects of interventions from observational data when well-known back-door and front-door adjustments are not applicable. We show that when an identifiable causal effect is subject to an implicit functional constraint that is not deducible from conditional independence relations, the estimator of the causal effect can exhibit bias due to small samples and the functional constraint through variables that we call trapdoor variables. We use simulated data to study different strategies to account for trapdoor variables and suggest how the related trapdoor bias might be minimized. The importance of trapdoor variables in causal effect estimation is illustrated with real data from the Life Course 1971-2002 study. Using this dataset, we estimate the causal effect of education on income in the Finnish context. Bayesian modelling allows us to take the parameter uncertainty into account and to present the estimated causal effects as posterior distributions.



There are no comments yet.


page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Understanding of causal relations forms the basis of decision making in society. The role of statistics is to provide tools that allow us to estimate the causal effects of planned interventions. Instead of modelling just associations, a causal model describes the functional relationships present in the system of interest. A core feature of causal models (Pearl, 2009)

is the capability to represent the effects of actions on the model via symbolic interventions. These interventions are assumed modular in the sense that they only modify the target of the intervention leaving other causal mechanisms in the model intact. The resulting probability distribution of this post-interventional causal model is defined as the causal effect of the intervention.

Various methods have been suggested for estimating causal effects in different settings. Propensity score matching (Rosenbaum and Rubin, 1983; Imbens, 2000), inverse probability weighting (Rosenbaum and Rubin, 1983; Rosenbaum, 1987), and -methods (Robins et al., 1992) are some of the methods from the potential outcomes framework. These methods try to mimic a randomized experiment by creating pseudo-samples or by weighting the original sample in various ways, and it is typically assumed that there are no unobserved confounders present. In the structural equation modelling (SEM) approach (see, e.g., Kline, 2011), observed variables are assumed to be Gaussian, and unobserved confounders are treated as correlations between observed variables.

Another approach to causal inference is based on causal graphs, where the notion of identifiability plays a central role (see Section 2). For certain graphs, the well-known back-door and front-door adjustment criteria (Pearl, 1995) allow for identification of interventional distributions from observational data in the presence of unobserved confounders. More generally, do-calculus (Pearl, 1995) can be used to assess whether the interventional distribution of interest is identifiable given only the known causal graph, without any parametric assumptions about the distributions of the variables or the form of the effects they have on each other. Do-calculus provides an identifying functional, i.e., a nonparametric formula for the interventional distribution consisting of terms that represent observational distributions. Although identifiability does not in general guarantee estimability (Maclaren and Nicholson, 2019), it is usually possible to obtain an estimator for the causal effect by replacing the terms present in the identifying functional by suitable parametric or nonparametric estimators and then combining the results accordingly.

We study the estimation of causal effects with small data in scenarios where standard adjustment criteria are not applicable. In addition, we consider the presence of functional equality constraints known as Verma-constraints (Verma and Pearl, 1990; Tian and Pearl, 2002; Robins, 1986). Under certain conditions (see Section 2), these constraints are related to special variables that we call trapdoor variables. These variables can bias the causal effect estimator for finite samples and we refer to this form of bias as trapdoor bias. We demonstrate the practical ramifications of trapdoor variables for the estimation of causal effects via simulations in a number of synthetic scenarios with small sample sizes and compare a variety of estimation strategies in both nonparametric and parametric settings.

As a motivating example, we consider Bayesian estimation of the causal effect of education on yearly income using real data from the Life Course 1971–2002 study (Kuusinen, 2018). The causal model for this study imposes a functional equality constraint on the causal effect of interest thus exemplifying the presence of a trapdoor variable. We combine Bayesian estimation with a specialized Monte Carlo approach in order to take the effect of the trapdoor variable into account. The Bayesian approach allows us to estimate the posterior distribution of the causal effect and quantify the uncertainty due to the small sample size. Codes for the simulation experiments, the real data example, and the figures in this paper are available at

The paper is structured as follows. Section 2 introduces the notation, gives a definition for the trapdoor variables and presents an example on a causal model where such a variable is present. Section 3 focuses on various aspects related to the estimation of causal effects in the presence of implicit functional constraints including a Bayesian approach and an example with binary data. Section 4 considers the effect of trapdoor variables and the trapdoor bias of causal effect estimators in a linear-Gaussian model through a simulation experiment. The real-data example is presented in Section 5. Section 6 provides some concluding remarks.

2 Functional constraints and trapdoor variables

2.1 Notation and basic definitions

Our analysis is based on the framework of structural causal models (SCM) and directed graphs, and we assume the reader to be familiar with these concepts and their core probabilistic and graphical properties. For a more detailed discussion on SCMs and graph theoretic concepts, we refer the reader to works such as (Pearl, 2009) and (Koller and Friedman, 2009).

We use capital letters to denote variables () and small letters to denote their values (). Bold letters are used to denote sets of variables () and value assignments . The set of all possible value assignments to is denoted by . We use

for both probability mass functions and probability density functions, and typically omit the dependence of (unknown) model parameters

from these functions, i.e we write as .

Each SCM over a set of variables is associated a joint probability distribution and a causal graph over where directed edges between two observed variables in correspond to direct causal relationships which are assumed to not form any cycles. Bidirected edges between two observed variables in are used to denote confounding by an unobserved common cause. In this framework, interventions are represented using the -operator. An intervention forces the variables in to take the values specified by while leaving other mechanisms of the model intact. This intervention induces a submodel with the interventional distribution . A causal effect is said to be identifiable in if it is uniquely computable from in any SCM that induces . A variable in the post-intervention model is denoted as . For an identifiable causal effect, an identifying functional is a function such that . We assume that for values making all conditional distributions and interventions well-defined.

Note that identifiability only indicates the existence of an estimator and does not take into account the potential problems stemming from finite data. Therefore, even though determining the identifiability of is an important first step, it does not guarantee that we can estimate the causal effect in practice without additional stability assumptions (Maclaren and Nicholson, 2019).

Interventional distributions exhibit conditional independence constraints, which can be characterized by -separation (Pearl, 1988) in the associated causal graph of the model. However, identifiable causal effects can be subject to Verma-constraints. Perhaps surprisingly, some of these constraints can be expressed as conditional independences in the interventional distribution (Shpitser and Pearl, 2008)

and can be used to give an alternative definition for nested Markov models in acyclic directed mixed graphs

(Richardson et al., 2017). Verma-constraints have been used for testing edges (Shpitser et al., 2009) and for marginalization via variable elimination (Shpitser et al., 2011).

2.2 Trapdoor variables

Here we give a broad definition that captures the notion that a set of variables may appear in an identifying functional of a causal effect, but the value of the causal effect is not dependent on the value of .

Definition 1 (Functional independence)

Let be disjoint and let be an identifiable causal effect with an identifying functional . If the domain of is , and for all , and , then is functionally independent from .

The constraint defined above is specific to the given identifying functional. In some instances we may be able to find identifying functionals that do not exhibit functional equality constraints for any subset of . Our interest lies in the opposite direction, where every identifying functional exhibits a specific type of functional independence. Before characterizing this property of interest, we must first define an operation known as the latent projection of a causal graph (Pearl and Verma, 1991).

Definition 2 (Latent projection)

Let be a causal graph over a set of vertices . The latent projection is a causal graph over where for every pair of distinct vertices it holds that

  1. contains an edge if there exist a directed path in on which every vertex except and are is in .

  2. contains an edge if there exists a path from to in that does not contain the pattern (a collider), on which every vertex except and is in , the first edge of the path has as arrowhead into , and the last edge has an arrowhead into .

Latent projections can be used to derive identifying functionals for causal effects such that they do not contain a specific variable, i.e., the variable is considered latent, and the causal effect of interest is identified in the corresponding latent projection (Tikka and Karvanen, 2018). For this reason the presence of a functional constraint in some identifying functional does not rule out the possibility of obtaining another identifying functional that is not subject to the same constraint. For example, consider the causal graph of Fig. 1. The causal effect of on is identifiable in this graph which can be verified using do-calculus or by applying an identifiability algorithm such as the one by Huang and Valtorta (2006) or the ID algorithm by Shpitser and Pearl (2006). Application of the ID algorithm implemented in the R package causaleffect (Tikka and Karvanen, 2017) provides the formula

where the left-hand side depends on the value of and , but on the right-hand the variable is also present and it is not subject to summation, unlike the variable . This means that there is a functional equality constraint stating that the right-hand side does not depend on . However, there is also an admissible set which gives us

i.e., a simple back-door formula where there is no variable present.

Figure 1: A causal graph where the identifying functional of obtained by an application of the ID algorithm is functionally independent of and there is an admissible set enabling back-door adjustment.

We rule out this possibility of finding alternative identifying functionals where the functional equality constraint does not emerge by restricting our attention to settings where a trapdoor variable is present.

Definition 3 (Trapdoor variable)

If is identifiable in from , its identifying functional is functionally independent of where , and is not identifiable in from , then is a set of trapdoor variables with respect to in .

Finding trapdoor variables is straightforward as outlined by their definition. Given a causal effect of interest, we first determine its identifiability and whether the identifying functional is subject to functional equality constraints. If this is the case, we proceed to verify whether the causal effect might be identifiable when the set is considered latent using a suitable latent projection. The algorithm by Tian and Pearl (2002) can be used to systematically enumerate functional equality constraints implied by a causal model. Evans (2018)

showed that this constraint-finding algorithm is complete for categorical variables, but it is not known whether the algorithm is complete in general, i.e., whether all such constraints can be found by the algorithm.

2.3 Example on trapdoor variables



Figure 2: Three causal graphs of increasing complexity, where the interest is in the causal effect of on .

Consider the three causal graphs in Fig. 2. We are interested in estimating the causal effect of on . As a simple example, let us consider as the education level and as the yearly income. Furthermore, let correspond to the grade point average (GPA) from primary school, and to the socioeconomic status of the parents. In all three graphs we have an arrow from to , meaning that we assume that there is a direct causal effect of education on income. In addition, we assume that there may be some unobserved confounders between socioeconomic status of the parents and income. In the graphs of (b) and (c), we also assume that there is confounding between socioeconomic status of the parents and education level of the participant. In (c), we assume that the effect of socioeconomic status on the education level is mediated by the GPA. We will further extended the third graph with additional variables, such as gender, in Section 5.

Now consider the estimation of the interventional distribution , i.e., the distribution of when we intervene on by setting it to , which differs from a simple conditional distribution of in these graphs. In order to estimate , we need to find a formula for it in terms of the observed variables only, i.e., , , , and in our example. The nonparametric formula for the graph of (a) we obtain the so called back-door adjustment formula

By conditioning on we block all back-door paths from to i.e., those paths between and which have arrows into . Thus by estimating the parameters of the terms and we can estimate the interventional distribution of interest, . For example, assuming that all variables are Gaussian and their relationships are linear, we have , with denoting the estimated regression coefficient of variable on variable , denoting the intercept term and

corresponding to the estimated residual variance.

Now consider the graph of (b). In this case, conditioning on blocks the path but opens the path meaning that we cannot apply the back-door adjustment here. In fact, adding the unobserved confounder between and renders the causal effect nonidentifiable.

In the graph of (c), we assume that we have obtained data on variable which lies on the directed path from to . Given this additional information, the causal effect is again identifiable, and we have


However, there is no term for the distribution of in equation (1). Thus after estimating the distributions and , is essentially reduced to a fixed but unknown parameter in the context of . This graph is also considered by Tian and Pearl (2002), Pearl and Mackenzie (2018), and Jung et al. (2020). Tian and Pearl (2002) show that this graph contains a functional equality constraint of an interventional distribution which cannot be expressed as a conditional independence constraint using the observed variables. The constraint states that the formula for given in equation (1) is functionally independent of , meaning that its value is independent on the choice of , just as we would intuitively expect. However, when estimating equation (1) from the data we clearly must choose some value for , even though the constraint states that the actual value should not matter. In fact, is a trapdoor variable with respect to the identifying functional of equation (1) in this graph due to the functional equality constraint and the fact that the causal effect is not identifiable in the causal graph of (b) which is the latent projection of (c) when is considered latent. In the following sections we investigate the challenges that trapdoor variables impose on the estimation of causal effects.

3 Estimation of causal effect under implicit constraints

3.1 Plug-in estimation

A straightforward strategy for causal effect estimation is to construct a parametric model for the conditional distributions that appear in the formula of an identifying functional of a causal effect. As an example, consider the graph of

(c), and assume that we wish to estimate . We need to estimate the unknown model parameters corresponding to the causal effect of equation (1) and we let denote the density where the unknown parameters are assigned some fixed values such as their maximum likelihood (ML) estimates. The plug-in estimator for the expected value of the interventional distribution takes the following form


Note that this estimate may depend on the value of when using a small sample estimate for

. This dependency and the corresponding trapdoor bias vanishes when the models are correctly specified and the true parameters values are used (by the definition of functional independence). However, even if we assume unbiased estimator for

, the estimator (2) can still exhibit bias since it is a nonlinear function of . For large enough datasets the choice of in equation (2) should not have a large impact, but we will show that for small samples the value of the trapdoor variable can have a crucial role in estimating the correct causal effect of on .

An alternative strategy for defining the plug-in estimator is to eliminate the functional constraint by averaging the identifying functional over a (conditional) distribution of the trapdoor variable. If is a trapdoor variable and , then

since is functionally independent of . This leads to the following alternative estimator for the expected value of the interventional distribution in the causal graph of (c)


Note that while this strategy eliminates the problem of having to choose a specific value for , it also introduces a new problem of selecting the set and estimating the model parameters for the distribution . As before, this estimator is a nonlinear function of the parameters that can result in bias. Note that in a frequentist framework, there is no way to attribute a specific amount of the total bias to trapdoor bias, plug-in bias, or other possible sources of bias in the final estimate.

3.2 Coping with a trapdoor variable: An example with binary data

We will now illustrate different choices of the trapdoor variable with nonparametric estimation of in a case where all variables are binary. Consider Bernoulli variables defined in accordance with (c) as


By solving equation (1) analytically we obtain

which does not depend on . However, in practice when and are unobserved and we need to estimate distributions , , and from finite data, we will show how the estimate of can depend on the chosen value .

As an example, we simulated 100 000 datasets according to model (4) with sample sizes 100, 300, and 500, and estimated nonparametrically with various methods for dealing with the trapdoor variable. Besides fixing to zero or one and using the estimator Equation 2, we also computed a weighted average of these estimates, where the weights were based on the estimated marginal distribution , or the conditional distribution , corresponding to the estimator Equation 3. With sample sizes 100 and 300, there were some cases where fixing to zero or one lead to undefined probabilities (for example, there were no observations for which and ). In these cases (about of cases with the sample size 100, and less than with the sample size 300), the entire replication was discarded. Fig. 3 shows the results of the simulation. We see that using the fixed values with or leads to some bias. The weighted average estimators perform better, and the one based on outperforms the estimator based on . Overall, we are not far off from the ground truth in this simple setting.

Figure 3: Average estimates of and over 100 000 replications for the Bernoulli model with varying sample sizes and different strategies to account for the trapdoor variable . The dashed lines show the true probabilities.

3.3 Bayesian estimation of causal effects

We advocate the use of Bayesian methods for jointly estimating the parameters of the distributions of identifying functionals of interest. By drawing samples from the joint posterior of the model parameters (using any generic Bayesian inference engine, typically some type of Markov chain Monte Carlo (MCMC) algorithm), we propagate the parameter estimation uncertainty to the final causal effect estimates and avoid the plug-in bias due to the nonlinear formula of the identifying functional with respect to model parameters. This allows us to focus on the effects of the trapdoor variable and the trapdoor bias of the estimators. We can also obtain samples from the full posterior of

which can be used for straightforward evaluation of any properties of interest (such as mean and variance) of this posterior.

3.4 Monte Carlo approach

In parametric estimation of causal effects, we typically do not have an analytical formula for the types of plug-in estimators that are shown in equations (2) and (3). In these cases, we can use a Monte Carlo approach and draw samples from , and estimate the desired quantity using these samples. For our causal effect of interest defined by equation (1), we use Algorithm 1. In the end we have a weighted sample with corresponding normalized weights and we can compute, for example,


Here the trapdoor variable has a more crucial role than in the case of analytical formulas: With poor choices of , our weights can become degenerate, i.e. most of the weights are near zero. In order to avoid this, it is natural to use which should make the weights well behaved. The suitable number of Monte Carlo samples

can determined by computing the Monte Carlo standard error (MCSE) of our causal effect estimate. The MCSE measures the additional uncertainty in our estimate due to the finite sample size

. For example, the MCSE for estimator (5) can be computed as

When combining Algorithm 1 with Bayesian parameter estimation, Algorithm 1 is used at each MCMC iteration given the current values of the model parameters. We can then compute functions of at each iteration, giving us samples from its posterior distribution or we can store all weighted Monte Carlo samples leading to posterior distribution of which can be further used to evaluate functions of interest. In the latter case, it can be practical to resample (using the corresponding weights) and store only one replication from at each iteration if memory constraints limit the storing of all samples.

1:For :
2: Sample
3: Sample
4: Compute
5:Compute the normalized weights
Algorithm 1 Monte Carlo algorithm for sampling from defined by equation (1) with a fixed value for the trapdoor variable and number of Monte Carlo samples .

4 Linear-Gaussian model

4.1 Theoretical considerations

We will now consider the estimation of causal effects in a common linear-Gaussian case for the causal graph of (c). We assume that the underlying model is defined as


Here all the parameters , , , and are unknown, and and are unobserved confounders. Our observational model needed for estimating is


where parameters , , and are unknown and have to be estimated from the data.

Now using equations of model (7) to equation (1) yields



While the variance of does not contain the trapdoor variable , the expected value does. Also, if the interest is only in the difference then the effect of trapdoor variable cancels out in this linear case.

As our model is linear-Gaussian, we can apply path analysis (Wright, 1934) to our causal graph (see, e.g., Pearl (2013) for examples) to find the marginal covariance matrix for (shown in Appendix). From

, using the properties of multivariate normal distribution, we can obtain the theoretical formulas for the parameters

of model (7) in terms of the true parameters of the causal graph (see Appendix for details). Plugging these into equation (8), we obtain


As expected, given the true causal model and its known parameters, the effect of the trapdoor variable cancels out. Nevertheless, as in the Bernoulli case, estimating equation (8) using the parameter estimates from the observational model with a finite dataset might lead to an expected value dependent on , unless happens to estimate to zero.

4.2 Simulation experiment

Now consider a model based on equation (6) with


We will now compare various approaches to account for the trapdoor variable . Instead of the nonparametric approach used in Section 3.2, we now switch to parametric Bayesian modelling. For comparative purposes, we use uniform priors for all model parameters. As stated in Section 3, the Bayesian approach takes into account the uncertainty in due to parameter estimation by integrating over the posterior distribution of the parameters and avoids the plug-in bias due to the nonlinearity of equation (8).

For model estimation, we wrote a Stan model (Carpenter et al., 2017; Stan Development Team, 2019) which simultaneously estimates all unknown model parameters and using Markov chain Monte Carlo (MCMC). We estimate the true expected causal effect using the causal graph with observed confounders, with , and compare it to six different estimation methods:

  1. Fix the trapdoor variable to the constant in equation (8).

  2. Marginalize over i.e., in addition of estimating model (7) also estimate , so that the estimator (8) uses at each MCMC iteration.

  3. As above, but estimate and use .

  4. Use a constraint so that the contribution of is fixed to zero.

  5. Use a structural equation model (SEM), a common approach for linear-Gaussian causal modelling (Kline, 2011). We used the R package lavaan (Rosseel, 2012) for this purpose.

  6. Use the composition of weighting operators (CWO) (Jung et al., 2020), which applies weighted regression to estimate functions of interventional distributions for arbitrary identifying functionals.

Based on model (9) we have , , and , , and .

We sampled 1 000 replications from model (9) and for each replication estimated using a single MCMC chain with 10 000 post-warmup iterations and averaging over the iterations (thus taking account the uncertainty from parameter estimation), except for the SEM and CWO approaches which are based on the maximum likelihood estimates. Fig. 4 shows how, despite theoretical equivalence, results depend heavily on the chosen strategy. Perhaps the most natural choices of and are prone to bias which depends on the value , but the case where is adapted based on performs well. The SEM approach, which explicitly models the covariance structures between variables, also performs well but is not applicable in more general graphs with non-Gaussian or nonlinear equations. The CWO method shows considerable bias when .

Figure 4: Average estimates of and over 1 000 replications with varying sample sizes and different choices for taking the trapdoor variable into account. The dashed lines show the true expected values.

4.3 Monte Carlo example

Consider again model (9) but now with

i.e., smaller measurement error in , leading also to a narrower density of . Because of this, a poor choice of trapdoor variable can cause most of the normalized weights in Algorithm 1 to be close to zero, leading to inefficient Monte Carlo sampling. As an example, we simulated one replication of size from this model, and estimated the posterior distribution of , both using the analytical formula and Monte Carlo with . For MCMC, we ran 4 chains with a total of 100 000 post-warmup iterations. In addition, at each MCMC iteration we sampled one replication from the posterior density from the weighted Monte Carlo sample. Fig. 5 shows the posterior distributions. We can see that with suitable both analytical and Monte Carlo methods give approximately equal results, but poor choice of biases results compared to the analytical solution (which is also biased) as can be seen from the discrepancy between the posterior distributions of based on Monte Carlo and analytical approaches. With , the average MCSE (over posterior samples) was 0.08 for and 0.22 for . The additional variation due to the Monte Carlo sampling can inflate the posterior distribution of , but here with the proportion of MCSE of the total posterior uncertainty is negligible: when using

, the posterior standard deviations of analytical and Monte Carlo methods were 0.40 and 0.41, respectively.

Figure 5: Posterior distributions of and of the linear-Gaussian model for different estimation methods.

5 Real-data example with continuous and categorical variables

As an example with real data, we analyse Life Course 1971–2002 dataset from Finnish Social Science Data Archive (Kuusinen, 2018)

. This longitudinal study consists of life courses of 634 Finnish children born in 1964–1968 in Jyväskylä, Finland. In the early 1970s, when children were aged between three and nine, the Illinois Test of Psycholinguistic Abilities (ITPA) was used to test their verbal intelligence. Participants were then followed up, with further information on their life events gathered in 1984, 1991, and 2002. While the dataset has been used in various studies regarding the intelligence and school achievement

(see, e.g., Kuusinen and Leskinen, 1988), we use this data to study the causal effect of education level (secondary or less, lowest/lower tertiary, or higher tertiary) on yearly income (euros, in 2000).

In addition to income and education, we have information on the GPA from primary school, socioeconomic status of the parents (low, middle, high), gender, and the ITPA score (the General Language composite variable, a combination of the subtests). Out of the 634 participants in the full dataset, we use 509 participants with fully observed aforementioned variables and avoid considerations related to missing data.

The corresponding causal graph is shown in Fig. 6 which has the same structure as our example graphs earlier, except now there are additional vertices denoting gender () and ITPA score (). This leads to the following formula for :

Figure 6: Causal graph representing the effect of education level on income . Other variables in the graph are gender , score on Illinois Test of Psycholinguistic Abilities , socioeconomic status of the parents , and the grade point average at the end of primary school.

Compared to Algorithm 1, we now need to marginalize over and , leading to the Monte Carlo approach of Algorithm 2. This gives us weighted replications from which allows us to compute, e.g.,

Here the weights depend not only on the distribution of the intervention variable , but also on the conditional distribution of .

1:For :
2: Sample and
3: Set the value of the trapdoor variable, for example as
4: For :
5:  Sample
6:  Sample
7:  Compute
8: Compute the normalized weights
Algorithm 2 Monte Carlo algorithm for the Life Course 1971–2002 dataset example.

We assume a Gamma distribution for the income variable

, and model its expected value given other variables via log-link, assuming a monotonic effect (Bürkner and Charpentier, 2018)

of education and socioeconomic status. We treat the education level and socioeconomic status of the parents as ordinal variables and model them with a sequential model and logit-link

(Tutz, 1990; Bürkner and Vuorre, 2019). We use a normal distribution for the conditional and unconditional distributions of ITPA scores

, and a Bernoulli distribution for the gender


We employ two strategies for dealing with the trapdoor variable. In the first case, we use marginal trapdoor variable (marginal with respect to ), and in the second case, we use conditional trapdoor variable , where the value of the trapdoor variable varies within the outer loop of Algorithm 2

(and at each MCMC iteration). We use a Beta distribution for estimating the expectation of GPA

in both cases (after transforming the original scale from 4–10 to 0–1), by modelling the expected value via logit-link and precision via log-link. For full model specification, see the Stan model file in supplementary materials at Github.

Figure 7: Posterior distribution of for the income model using the estimators with the conditional trapdoor variable (upper panel) and the mean trapdoor variable (lower panel).
Figure 8: Posterior distribution of for the income model using the estimators with the conditional trapdoor variable (upper panel) and the mean trapdoor variable (lower panel).

We used four chains with total of 100 000 post-warmup iterations and , leading to MCSE around 50–80, depending on with both trapdoor strategies.

Fig. 7 shows the posterior distribution of for all education groups. We see a clear discrepancy between the estimates based on the conditional trapdoor variable (upper panel) and the marginal trapdoor variable (lower panel). On the basis of the simulation results of Section 4, we prefer the estimates with (upper panel). Finally, Fig. 8 shows the full posterior distributions of . The two strategies for the trapdoor variable give somewhat different results: With the interventional distributions between educational levels differ more than with . For example, we estimate the mean annual income for the highest education level as 29000 euro using the conditional trapdoor variable compared to 26000 euro when using the marginal mean for the trapdoor variable.

6 Conclusion

We have shown how it is possible to estimate causal effects when the back-door and front-door adjustments are not applicable and highlighted the potential issues related to the application of theoretical identifying formulas to finite datasets. Our real-data example on the effect of education on income illustrates how Bayesian causal inference can be applied to complex causal graphs with multiple types of variables, and how trapdoor variables can have a substantial effect on the estimated causal effects.

The basic structure that leads to implicit functional constraints and trapdoor variables was presented in the graph of (c). This graph has been considered in the literature earlier but according to our knowledge, it has not been fully analysed from the viewpoint of estimation. This basic structure can be extended in several directions so that the causal effect is still identifiable while retaining the implicit functional constraint. The graph for the Life Course 1971–2002 study (Fig. 6) is an example of such an extension. There are likely other interesting graphs with similar or even more complex identifying functionals which exhibit the problems related to trapdoor variables.

Determining the optimal method to account for the presence of a trapdoor variable remains a challenging problem. The trapdoor bias exhibited for small sample sizes depends on the (possibly parametric) assumptions about the causal model as well as the properties of the estimators used for the conditional distributions appearing in the identifying functional. Even in the simple case of the linear model with a single trapdoor variable, the optimal method is not apparent. It may also be the case that minimizing the trapdoor bias might result in a large variance of the causal effect estimator necessitating further considerations about the estimation problem at hand.

Our simulation experiments in Bernoulli and linear-Gaussian cases illustrated how choosing the value of the trapdoor variable can have substantial effect in estimating the interventional distribution . Our results suggest that a good default for the trapdoor variable should be based on its conditional distribution given the interventional variable . On the other hand, for nonlinear and non-Gaussian models the effect of the trapdoor variable can have nonlinear effects on which can further bias the results. Therefore as a general sensitivity check, we recommend computing the causal effect using various strategies to account for the trapdoor variables and reporting how sensitive the results are with respect to these strategies.

7 Acknowledgements

This work belongs to the thematic research area ”Decision analytics utilizing causal models and multiobjective optimization” (DEMO) supported by Academy of Finland (grant number 311877). We thank Yonghan Jung for providing example codes for the CWO method.

Appendix A Theoretical parameter estimates of the linear-Gaussian model of Section 4

Marginal covariance matrix of and can be found by applying path analysis on the rightmost graph of Fig. 2, which leads to

From we can obtain the following


  • Bürkner and Charpentier (2018) Bürkner, P.-C. and Charpentier, E. (2018) Monotonic effects: A principled approach for including ordinal predictors in regression models. PsyArXiv preprint
  • Bürkner and Vuorre (2019) Bürkner, P.-C. and Vuorre, M. (2019) Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science, 2, 77–101.
  • Carpenter et al. (2017) Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P. and Riddell, A. (2017) Stan: A probabilistic programming language. Journal of Statistical Software, 76, 1–32.
  • Evans (2018)

    Evans, R. J. (2018) Margins of discrete Bayesian networks.

    Annals of Statistics, 46, 2623–2656.
  • Huang and Valtorta (2006) Huang, Y. and Valtorta, M. (2006) Pearl’s calculus of intervention is complete. In

    Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence

    , 217–224. AUAI Press.
  • Imbens (2000) Imbens, G. W. (2000) The role of the propensity score in estimating dose-response functions. Biometrika, 87, 706–710.
  • Jung et al. (2020) Jung, Y., Tian, J. and Bareinboim, E. (2020) Estimating causal effects using weighting-based estimators. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, NY: AAAI Press.
  • Kline (2011) Kline, R. B. (2011) Principles And Practice of Structural Equation Modeling. New York: Guilford Press.
  • Koller and Friedman (2009) Koller, D. and Friedman, N. (2009) Probabilistic G