Unifying Causal Models with Trek Rules

by   Shuyan Wang, et al.
Carnegie Mellon University

In many scientific contexts, different investigators experiment with or observe different variables with data from a domain in which the distinct variable sets might well be related. This sort of fragmentation sometimes occurs in molecular biology, whether in studies of RNA expression or studies of protein interaction, and it is common in the social sciences. Models are built on the diverse data sets, but combining them can provide a more unified account of the causal processes in the domain. On the other hand, this problem is made challenging by the fact that a variable in one data set may influence variables in another although neither data set contains all of the variables involved. Several authors have proposed using conditional independence properties of fragmentary (marginal) data collections to form unified causal explanations when it is assumed that the data have a common causal explanation but cannot be merged to form a unified dataset. These methods typically return a large number of alternative causal models. The first part of the thesis shows that marginal datasets contain extra information that can be used to reduce the number of possible models, in some cases yielding a unique model.


page 1

page 2

page 3

page 4


Causal Inference Through the Structural Causal Marginal Problem

We introduce an approach to counterfactual inference based on merging in...

Causal learning with sufficient statistics: an information bottleneck approach

The inference of causal relationships using observational data from part...

Explanatory causal effects for model agnostic explanations

This paper studies the problem of estimating the contributions of featur...

Causal Discovery of Linear Cyclic Models from Multiple Experimental Data Sets with Overlapping Variables

Much of scientific data is collected as randomized experiments interveni...

Causal Discovery in a Binary Exclusive-or Skew Acyclic Model: BExSAM

Discovering causal relations among observed variables in a given data se...

Recovering Causal Structures from Low-Order Conditional Independencies

One of the common obstacles for learning causal models from data is that...

Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets

Scientific practice typically involves repeatedly studying a system, eac...

1 Introduction

Methods for unifying theories are typically particular to the theories, and depend on some deep insight into a shared fundamental structure. In contrast, for simple causal models, which abound in the biomedical and social sciences, general procedures for unification have been proposed. Causal relations between variables can be discovered by randomized experiments or by other interventions and also by analyzing non-experimental data. Many algorithms have been designed to find causal relations between variables from datasets. Most of these algorithms search for causal relations for variables measured in one dataset and output a directed graph in which variables directly connected by an edge are hypothesized to have relatively direct causal relations. However, due to restrictions such as time, location or privacy, all of the variables participating in a causal mechanism may not be measured jointly, in which case researchers may have several datasets sharing some but not all variables – overlapping variable sets.

Such marginal datasets impose restrictions on identifying causal relations, since interactions between some variables are not observed. Using only marginal datasets, even if researchers know that variables from all these datasets are from a shared causal system, there may be too many possibilities for an informative estimation of causal relations

[2]. Concatenating datasets, then running algorithms which can work with missing variables could handle this problem, but this method requires strong assumptions on how and why the values are missing and is not feasible for confidential data that cannot or will not be shared.

Besides concatenating datasets, several other responses have been made to this problem. The ION algorithm [1][2] takes as input a package of partial causal graphs, which are generated by running algorithms allowing for “latent variables” (e.g., the Fast Causal Inference (FCI) algorithm [6]) on each of the marginal datasets, and returns a package of unified graphs, all of which are acyclic, contain all variables, and are consistent with the conditional independence and dependence information estimated from the input data [2]. ION gives a set of possible causal mechanisms between all variables measured in any of the datasets. Integrative Causal Analysis (INCA) has also used conditional independence relations in analyzing data over different variable sets to generate causal models that are consistent with all marginal datasets[7].

The methods mentioned above work by finding unified causal graphs that include variables in each marginal dataset while preserving all the marginal conditional (in)dependence relations. The basic idea is to find those unified models that can account for all of the conditional independence and dependence relations found in the marginal data sets. Assuming the well-known Causal Markov Condition and Faithfulness assumption [6], this procedure can be revealing. For example, suppose the true causal relations are given by the graph in Figure 1:

Figure 1:

Suppose the observed data sets are for and . L is not observed in any data set. From sufficiently large samples, conditional independence methods can recover Figure 1 uniquely. In other cases, however, even apparently simple cases, the methods returns a plethora of alternative causal structures. For example, for the structure

Figure 2:

with marginal datasets:

five distinct structures can account for the marginal conditional independence and dependence relations. In many cases the number of alternative unifying models is very large.

However, marginal dependence or conditional independence relations are not the only information that can be used from marginal datasets. We will show that if the relationships are linear, TREK rules can aid in estimating the causal connections between two variables that only appear in separate datasets. Here, we explore the use of marginal correlations with the TREK rules to estimate a unified model, with results that can be more informative than those obtained solely from marginal conditional dependence and independence relations in multiple datasets.

2 The TREK Rule

We use directed graphs to represent causal relations between variables; each node represents one variable, each edge represents one (relatively) direct causal relation, with the direction from the cause (parent) to the effect (child). A node is called a descendant of another if can be reached by following a directed path starting from , which is called an ancestor of

. We assume the joint probability distribution on the variables respects the Markov condition, i.e., all variables conditioned on their parents are independent from the set of all of their non-descendants. Two acyclic directed graphs (DAG) are called Markov equivalent if they entail the same conditional independence relation based on the Markov condition. The Faithfulness assumption, that all conditional independence relations are consequences of the Markov condition is made but in some cases is not necessary.

We assume that all causal relations are linear: as an effect, every variable is a linear combination of influences from its direct causes, included unmeasured “disturbances” of each variable that are independent of its measured causes. We also assume that no measured variable is a deterministic function of any set of other measured variables. Formally, if we use linear coefficient a to show the strength of the influence from one direct cause, , to an effect , and assume that is also caused by some other unobserved noises independent from the direct cause , the relation between and is:

If has more than one direct cause, the relation between them is;

The TREK rule[6] can be derived from these assumptions. A trek between two variables, and , is defined as either a directed path starting from one variable that ends at another, or two directed paths starting from a common third variable , the two paths intersecting only at , with one path ending at and another at . The following figure shows these two types of treks. In the top trek, is a remote cause of ; in the bottom trek, and are indirect effects, i.e. descendants, of a common cause .

Figure 3:

The TREK rule says: the correlation between any two standardized variables is the sum of products of linear coefficient on each trek between them. “Standardized” means that all variables are rescaled to have a mean

of 0 and variance 1. For example, in the figure

3, the correlation () between (standardized) and is:

Now we are going to prove the TREK rule for linear, acyclic systems of standardized variables with independent disturbance terms.


first line.

Proof sketch. is the standardization of iff . The mean of any standardized variable, , is 1 and the variance, , is 1. The correlations of two standardized variables, , , is the expectation of their product, .

A trek is a pair of directed paths terminating in two distinct variables, , and intersecting at a single variable, , the source of trek . Or, a single directed path from into , in which case .

Notation: , , , etc. denote the coefficient for the ith edge in trek starting from the source.

Remark 1: denotes . If is the graph, the correlation of , is , because and are uncorrelated. Suppose for every causal graph of length , the correlation of the terminal variables, , is . Let be a chain graph of length by adding one edge . Then . Using an induction argument, we conclude that the correlation of a causal chain of any length is given by the product of the edge coefficients.

Remark 2: is the graph of the linear system . since . Applying Remark 1 to each side of , for any pair of directed paths and, with respective edge coefficients and , .

Consider the graph again, if we add an additional trek between and , such that the length of the path between the source and either or does not exceed 1, it is easy to see that where is the coefficient of the edge connecting and or the correlation between and . By an induction, for and connected by any two treks and with coefficients and , By an induction, always, , which is the TREK rule. ∎

3 Estimating Causal Connections Using the TREK Rule: Examples

In this section we give three examples to show how the TREK rule can be used either to estimate unified causal graphs formed by variables measured in marginal datasets or to reveal information that is not explicit when analyzing causal connections based only on conditional independence. Each example starts with a true causal graph and marginal datasets. Assuming faithfulness, linear relations and Gaussian distributions, we examine what dependence and independence relation can be obtained from these datasets. Based on the obtained (in)dependence and correlations measured from these datasets, the TREK rule helps to estimate causal connections and narrow down the range of possible unified causal graphs.

3.1 Case One[3]

The true causal graph is:

Figure 4: True Graph in Case One

with marginal datasets:

From the true causal graph we know that the independence relations we get from the three datasets above are:

Based on faithfulness we can tell that is a collider with and on each side, because and become dependent conditioning on . We also know there is no trek connecting and because they are marginally independent. Similarly, there is no trek connecting and . All the graphs below agree with these dependence and independence relations:

Figure 5: Possible Graphs

To rule out some of these candidates, we have to use more than independence and conditional independence information. One choice is correlation. The non-zero correlations we know are , , and . By comparing some of these correlations, we can rule out any causal graph such that if this graph were true, the TREK rule would be violated. For instance, if the true graph is 2) or 4), by TREK rule:

Since , we have .

That is to say, if is smaller than , the true graph cannot be 2) or 4). Similarly, comparing the absolute value between and may rule out 3) and 4). The effect of applying TREK rules in this way is summarized in the table below (“X” means “the condition in the row rules out the model in the column”):

Figure 6:

If we can measure either or , we can compare and or and and rule out more cases. For instance, if we know that is greater than , we can rule out graph 1) and 2).

3.2 Case Two

Figure 7: True Graph for Case Two

The measured datasets are:

The independence relations we can get from those datasets are:

From these conditional independences, we know that every trek connecting and contains and . We can also determine the relative position of and in the trek: the correlation with and the variable closer to has a larger absolute value. Similar to Case 1, the TREK rule yields:

if is between and


if is between and

Since the absolute value of correlation is between 0 and 1, comparing absolute values of correlations can reveal the causal connection between these three variables. Since and are not independent conditioning on , we know that unlike and , is not in every trek connecting and . Therefore, maybe a collider. Note that from the dataset , we should find that the marginal correlation between any pair of variables is different from the conditional correlation (i.e., , , ). This means that each variable in this dataset is either in a trek connecting the other two variables, or a collider or descendant of a collider in path connecting the two variables. If is a collider, this could only happen when neither nor is in the path. Therefore, since is a collider, we end up with figure 8:

Figure 8: Possible Graph for Case Two

In this case, we can recover the full causal graph except for the directions of the edges between , , and Comparing this case with case 1, we see that they have the same marginal datasets and the only difference between them is that in case 2 and are connected. This extra edge reduces the number of candidates for the true causal graph (up to Markov indistinguishability) from five to one. That is because the direct connection between and enables and to be connected by a trek, the longest trek in the true graph. From the marginal datasets we see that every variable in this trek is measured together with the endpoints ( and ) of this trek, which enables us to determine the exact structure of this trek. If is not a collider, is in at least one trek connecting and . As stated above, and should also be in the trek that contains . Since the causal graph is assumed to be a DAG, for the set either or has to be a descendent of collider in a path connecting the other two variables. The only two situations compatible with the “, , ” information are figure 9 (i) and (ii) 111If directly connects to and , either or is violated.

Figure 9: Possible Graph for Case Two

Therefore, in this case, we can narrow down the possible true graph into three situations (figure 8 and 9) using the inequality between different correlations entailed by the TREK rule.

3.2.1 Case Three

This case shows that the TREK rule can be used to check for the existence of latent variable or directed edge. Only one graph is used here for illustration, but this method is at least theoretically available to graphs with this 4(5)-member subgraph. Suppose it is known that there is no direct connection between , and the true, unknown model is as shown in figure 10:

Figure 10: True Graph for Case Three

In the figure 10, denotes a latent variable. If all four observed variables, to , can be measured together, whether exists or not will result in different conditional independence relations, in which case the method introduced here is redundant. If, however, only some of those variables can be measured, for instance X1, X2, X4 and X1, X3, X4, then whether L exists is not obvious anymore. The method introduced here can be used to check the existence of L in some of the possible graphs. If the latent variable does not exist, the true graph could be:

Figure 11: Possible Graph for Case Three

By TREK rule, we have:

  1. ,

Since all the correlations needed are contained in the two marginal datasets, we can get and by solving 1), 2) and 3); then by 1), we can check whether the equation 4) holds or not. If the equation holds, the latent variable does not exist. If the equation does not hold, then there should be extra connection between and , which could be a latent common cause of and or a direct connection between them.

4 Case with Non-Gaussianity

If Gaussian distributions are assumed, causal relations can only be estimated up to the Markov equivalence class and we may not know the direction of many edges in a causal graph. However, if we assume non-Gaussian distributions, we can apply algorithms, such as LiNGAM, to each marginal dataset, which return partial graphs where each edge has a direction unless a latent common cause exists [5]. In the non-Gaussian case, we can determine whether a path between two variables is a trek. If directions of edges tell us that a path is a trek, we can apply the TREK rule directly to estimate the connection between the terminal variables of the trek. Consider a case where the causal graph is figure 12:

Figure 12: True Graph with non-Gaussianity

If the datasets are and , there is no information about conditional independence available to use. Assuming Gaussian distributions, the unified graph we get from those two marginal datasets is figure 13:

Figure 13: Unified Graph with Gaussianity

Namely, each marginal dataset tells us that every pair of variables are dependent, so what we get are two triangles. To estimate a unified causal graph, we can only put the two triangles together.

However, if we assume non-Gaussian distributions, we can run causal discovery algorithms working on non-Gaussian distributions, such as LiNGAM, on each dataset. Such algorithms will estimate the direction of inference between each pair of variables. For figure 2 with marginal datasets and , we get the two graphs below:

Figure 14: Two Possible Graphs with non-Gaussianity

Note that now since we know the direction of each edge, an undirected trek (including and ) between and can be identified. If there exist other treks connecting and , by the TREK rule, we should have:

Since every variable is standardized, from the two triangles above we know that the coefficient on each edge equals the correlation between two variables connected by that edge (if there are no other treks between those variables). In order to check whether the inequality holds, we just need to plug the corresponding correlations into the formula. If instead of an inequality, what we get is an equality:

We can conclude that there is no other trek connecting and and get the true causal graph by removing the edge between them.

5 General Principles

So far examples above shows that using the TREK rule to estimate causal connection follows these principles:

  1. Possible treks can be identified by conditional independence;

  2. Comparing the absolute value of correlation between variables in the same trek rules out candidate causal graphs;

  3. Calculating correlations by the TREK rule on possible treks rules out redundant connections between variables.

The three principles above generally depict how the TREK rule works: in order to apply the rule, the first step is to determine which two or more variables are potentially connected by treks and what variables are contained in the trek, which is principle 1; after identifying a potential trek and its component, we can compare the absolute value of correlations between variables in the same trek and rule out all the causal graphs that violates the TREK rule based on the result of the comparison, which is principle 2; furthermore, if available correlation allows, we can calculate the theoretical correlation between variables being connected by treks and use it to estimate the existence of latent variable or omitted direct connection, which is the principle 3 and is used in case three.

6 TREK Rules Can Inform the Choice of Further Experiments

All those cases provided in the last three sections show that instead of just enumerating all possible unified causal graphs consistent with the conditional independence revealed by marginal datasets, for linear systems we can make more specific estimations about connections between variables by applying the TREK rule, such as removing redundant edges or determining relative positions of variables in a path. Furthermore, the motivation of applying the TREK rule can guide researchers to make future measurements more efficient. The idea is illustrated in the case below:

Figure 15: True Graph

Consider a situation where the true graph is figure 15. Suppose that for all these variables that researchers are interested in, only a few of them can be measured together each time. Now consider that currently available marginal datasets are:

; ; ; ; ;

The information about conditional independence and dependence we can get from these datasets is limited: from and , we know that B and F are colliders between and ; from other datasets we only get that marginal dependence relations and independence relations between X or Y and other variables (for instance, from we get and ). Even if we assume non-Gaussian distributions and know the direction of edges between variables dependent on each other, such as and , the TREK rule cannot be usefully applied; there are too many candidates of unified causal graphs that satisfy these dependence and independence relations. However, we can observe that

Potentially there is a TREK connecting X and F that contains and . The existence of such a trek cannot be verified directly because we do not know and , but if future measurements are possible, we can make one more measurement: . In this way, we can get three more correlations: , and . Now we can test whether and are in the trek connecting and . If such a trek exists and there are no other treks connecting and , then we should have:


222Here we are using (correlation) and a (coefficient) interchangeably. It is because here we are testing if and are in the unique trek connecting and . If and are in the unique trek connecting and , then we should have , , .

Therefore, if we find

then it is likely that and are in the trek connecting and .

Moreover, we can also find:

From equation above we can conclude that and are connected by two treks: one of which contains , , and , and can be fully determined; the other contains . Since can we observe that:

It is possible that and are in the trek connecting and which has . We can estimate whether:

By the TREK rule, if this equation holds, then it is likely that is in the other trek connecting and . Based on all these conclusions, we nearly recover the true graph.

Notice that here the trek connecting and is the longest trek that could exist given the initial pack of marginal datasets. Measuring B and F together with an additional variable could enable us to use the TREK rule involving more variables and get much more information than measuring other variables together.

From this case, we can see that although most of time the TREK rule cannot identify a unique unified causal graph (which is highly dependent on what marginal datasets are available), it can be helpful as a criterion to plan future measurements.

7 Discussion

The limitation of the TREK rule to linear systems is less stringent than it may appear. Non-linear systems can be transformed into linear systems in several ways that preserve the graphical causal structure. One long-standing method is domain specific transformations of individual variables. Econometric models, for example commonly express prices as logarithms, presumably because economists decided long ago that the log of prices has a Normal (Gaussian) distribution. But there are more general, domain independent transformations. For any of a large family of probability distributions (roughly, those whose cumulative distribution function has a smooth, monotonic map to the cumulative distribution of the Gaussian) a nonparanormal transformation yields a joint Gaussian distribution


. The relations among the variables can be expressed as linear regressions in these transformed variables, with additive disturbances. The regression coefficients obey the TREK rules when the transformed variables are standardized.

However, the TREK rule is not practical for dense graphs in which a pair of variables is connected by several treks. When the graph is dense the choices of marginal data sets will interact with trek rules in complex ways that may prevent obtaining useful information from trek constraints.


  • [1] David Danks. Scientific Coherence and the Fusion of Experimental Results. The British Journal for the Philosophy of Science, 56(4):791–807, 10 2005.
  • [2] David Danks, Clark Glymour, and Robert E. Tillman. Integrating locally learned causal structures with overlapping variables. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1665–1672. Curran Associates, Inc., 2009.
  • [3] David Danks and Sergey M. Plis. Amalgamating evidence of dynamics. Synthese, pages 1–18, 2017.
  • [4] Han Liu, John Lafferty, and Larry Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res., 10:2295–2328, December 2009.
  • [5] Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res., 7:2003–2030, December 2006.
  • [6] Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search, volume 81. 01 1993.
  • [7] Ioannis Tsamardinos, Sofia Triantafillou, and Vincenzo Lagani. Towards integrative causal analysis of heterogeneous data sets and studies. J. Mach. Learn. Res., 13:1097–1157, 2012.