1 Introduction
Causal discovery is crucial for understanding the actual mechanism underlying events in fields such as neuroscience SanchezRomero et al. (2019), biology Sachs et al. (2005) and social networks Cai et al. (2016). In such areas, the aim of the inquiry is to discover causal relations among variables that are measured only indirectly. Unmeasured variables and their influence on measured variables are unknown prior to the inquiry. Various methods for discovering the causal structure from observed samples have been proposed. However, most of them assume that the system of variables is causal sufficient, which means no pairs of variables have an unmeasured common cause (also called a latent confounder) Spirtes et al. (2001). Real applications typically violate this assumption. For example, some variables might not be measured because of limitations in data collection, and other variables may not even be considered in the data collection design. Without considering the presence of latent confounders, these algorithms return some false causal relations. Thus, developing a causal discovery method in the presence of latent confounders is an important research topic.
Methods for finding latent confounders and their relationships began early in the 20th century in factor analysis and its applications. In the case of continuous variables, linear relationships among variables are widely used as the datageneration assumption in searches for structural equation models (SEMs). Recently SEMs have begun to employ nonGaussian additive (unmeasured) disturbances for each variable. The LvLiNGAM Hoyer et al. (2008) algorithm, which uses overcomplete Independent Components Analysis (ICA) Eriksson and Koivunen (2004) Lewicki and Sejnowski (2000)
, has been proposed to estimate the causal relations among measured variables in systems with linearly related variables. Given the number of latent confounders and appropriate data, it can in principle identify the measured variables sharing a common cause or causes, as well as the causal relations between measured variables, but it requires latent confounders to be mutually independent. This independence is impractical when the number of variables is large. The algorithm easily falls into local optima, which produces estimation errors aggravated by highdimensional data. The ParceLiNGAM
Tashiro et al. (2014) and PairwiseLvLiNGAM Entner and Hoyer (2010) methods have been proposed for the same model class, but these methods fail to identify the causal structure given in Fig. 1. Existing independence noisebased methods have a high computational load and do not fully identify the causal structure.Constraintbased methods such as the Fast Causal Inference (FCI) algorithm Spirtes et al. (2001) is another type of methods for recovering causal structures. Although the results of the FCI algorithm are statistically consistent but provide limited information. For example, even when no confounders exist, FCI usually provides too few directed, unconfounded causal relationships; on the other hand, for a small number of variable pairs, hidden variables usually can not be found. As a specific example, consider the data generated according to the Directed Acyclic Graph (DAG) shown in Figure 2(a). The FCI output, called Partial Ancestral Graph (PAG), is given as Figure 2(b). The adjacency and arrowheads in Figure 2(b) are mostly correct, but some undetermined tails of edges remain.
From these observations, we propose a hybrid method assuming linearity and nonGaussianity, to take advantages of both constraintbased methods and independent noisebased methods to handle both confounded and unconfounded situations. However, designing such a solution is a nontrivial task due to the two specific challenges raised by the high dimensionality of the measured variables and the latent confounders. One is how to efficiently decompose a large global graph into local small structures without introducing new latent confounders. The second is how to recover local structures accurately in the presence of latent confounders. To address these challenges, we first employ FCI to remove some independent causal relationships. This output will not be complete, in the sense that it contains many undetermined causal edges when latent confounders might not exist. We further refine this output to examine unconfounded causal edges and locate the latent confounders by applying an independent noisebased method among only those adjacent pairs informed by the FCI result. The Triad condition Cai et al. (2019) identifies some shared latent confounders and the causal relations between measured variables. If some causal directions are still undetermined, we apply overcomplete ICA locally to refine the causal structure.
We summarize our contributions as follows:

We propose a hybrid framework to reconstruct the entire causal structure from measured data, handling both confounded and unconfounded situations.

We show the completeness result of our proposed method, demonstrating the correctness of our method on the theoretical side;

We verify the correctness and effectiveness of our method on simulated and realworld data, showing results to be mostly consistent with the background knowledge.
2 Graphical Models
We employ two types of graphical representations of causal relations: Directed Acyclic Graphs (DAGs) and Partial Ancestral Graphs (PAGs).
2.1 DAG Description
A DAG can be used to represent both causal and independence relationships. A DAG contains a set of vertices and a set of directed edges (
), where each vertex represents one random variable.
means that is a “direct” cause (or parent) of , that is, is a direct effect (or child) of . Figure 2(a) shows an example of a DAG . In Figure 2(a), is a parent of , or is a child of , due to the edge . Two vertices are adjacent if there is a directed edge or . A directed path from to is a sequence of vertices beginning with and ending with such that each vertex in the sequence is a child of its predecessor in the sequence. Any sequence of vertices in which each vertex is adjacent to its predecessor is an undirected path. A vertex, in a path is a collider if is a child of both its predecessor and its successor in the path.dseparation Pearl (1988). Let be a set of variables in DAG that does not have either of and as members. and are dseparated given if and only if there exists no undirected path between and , such that both of the following conditions hold:
(i) every collider on has a descendent (or itself) in ;
(ii) no variable on U that is not a collider is in .
Two variables that are not dseparated by are said to be dconnected given .
2.2 PAG Description
A PAG contains four different types of edges between two variables: directed edge (), bidirected edge (), partially directed edge (), and nondirected edge (). A directed edge means that is a cause of . A bidirected edge indicates that there is a latent confounder that is a common cause of and . A partially directed edge indicates that either is a cause of , or there is an unmeasured variable influencing and , or both. A nondirected edge means exactly one of the following holds: (a) is a cause of ; (b) is a cause of ; (c) there is an unmeasured variable influencing and ; (d) both a and c; or (e) both b and c. In a PAG, the end marks of some edges may be undetermined, i.e., the undetermined edge is an edge other than the directed edge.
Figure 2(b) shows a PAG representing the set of all DAGs that imply the same conditional independence relations among the measured variables as does the DAG (Figure 2(a)). For example, the bidirected edge between and means that there is a latent confounder influencing and . The nondirected edge between and shows a class of causal relation between and , that is, this edge might be: , , .
3 Problem Definition
To help with the definition of the scope of our solution, we assume that all samples are infinite, independent distributed, following the same joint probability distribution
. Further, we make some or all of the following assumptions according to context.. Causal Markov Assumption. Two variables and are independent given a subset of variables not containing and , if and are dseparated given .
. Causal Faithfulness Assumption. Let . If and are independent conditional on in , then is dseparated from conditional on in .
We assume the target to be discovered is a DAG, represented as a linear nonGaussian model with latent confounders (named as LvLiNGAM), as defined by Hoyer et al. Hoyer et al. (2008), in which each measured variable in , , is generated from its parents including measured variables and latent confounders with an additive noise term. The matrix form of LvLiNGAM then can be formalized as
(1) 
where is the matrix of causal strengths among measured variables, is the matrix of causal influences of the latent confounders on measured variables, and the noise terms, as components of , are mutually independent and nonGaussian. According to lvLiNGAM by Hoyer et al. Hoyer et al. (2008), we know that can be permutated to be a lower triangular matrix, and Equation 1 can be changed to
(2)  
where and
denotes the identity matrix.
Based on Equation 1, we make the following further assumptions:
. Linear Acyclic NonGaussianity Assumption. The causal graph over all variables, including the latent variables, is a directed acyclic graph (DAG), which represents the model in which the causal relations among any variables are linear and all noise terms are nonGaussian and mutually independent.
. One Latent Confounder Assumption. All latent confounders are independent of each other, and each pair of observed variables is directly influenced by at most one latent confounder.
Based on the above assumptions, we define our problem as follows.
Definition 1.
(Problem Definition) Given the observational data generated by causal model as Equation 1, reconstruct the causal graph over measured variables and latent confounders.
4 A Hybrid Method for Causal Discovery in the Presence of Latent Confounders
In this section, we describe our approach in detail, and explaining how it recover the true graph shown in Figure 3(a) that represents causal model (1). The proposed framework is given in Figure 3.
The idea is as follows. After running the FCI algorithm to obtain a PAG, we further try to orient edges by regression and subsequent independence testing, extrapolating directions by the wellknown Meek rules. From regression residuals, we further determine local causal structures for pairs of variables that are adjacent with an undetermined edge in the PAG. We then introduce a constraint condition for triples of variables to detect and combine some latent confounders. Finally, under the further assumption that the latent confounders are independent, we use overcomplete ICA to determine the remaining edges when needed. The pseudocode of this framework (named FRITL) is described in Algorithm 1. The use of these four steps can be selected according to the purpose.
4.1 Stage I: Constructing PAG Using FCI
We begin by supposing that the data generated by causal model (1) satisfies assumptions A1A2. The FCI algorithm outputs a PAG, which represents estimated features of the true causal DAG according to the following theorem Spirtes et al. (2001) and lemma.
Theorem 1.
Given the assumptions A1A2, the FCI algorithm outputs a PAG that represents a class of graphs including the true causal DAG.
Lemma 1.
Given the assumptions A1A2, if FCI converges to a PAG with a directed edge between and , then there is a directed edge between and in the true DAG.
4.2 Stage II: Inferring Local Structures Using Independence Noise Condition
After running stage I, we obtain the PAG (given in Figure 3(b)) that is (asymptotically) correct information of the causal structure but usually provides few direct influences. Although we can apply overcomplete ICA to estimate the true causal graph, the result may suffer from local optima, especially if the number of measured variables is larger than four. In contrast, the “divideandconquer” provides more causal information about the undetermined edges in the graph and only requires performing overcomplete ICA on a small number of variables to estimate the local causal structures. This second stage produces correct, informative causal discovery result with relatively low computational complexity. We note that in the linear nonGaussian case, unconfounded causal relations can always be determined by regression and independence testing Shimizu et al. (2011). Inspired by this, we consider generalizing regression and independence test from global causal structure to local causal structure, even when there are latent confounders.
4.2.1 Identification of causal direction between unconfounded pairs of variables
We first provide a lemma to identify the causal direction of variables that are not influenced by confounders. Let denote the PAG obtained by FCI. From the definition of a PAG, variables connected to the measured variable through a directed, nondirected, or partially directed edge are the potential parents of . For example, is a potential parent of if , , or . Let denote the potential parents of in .
If there are no latent or observed confounders of and any of , we can generalize Lemma 1 proposed by Shimizu et al. Shimizu et al. (2011) to determine local causal structures. We first introduce the DarmoisSkitovitch Theorem Darmois (1953)Skitovitch (1953), which determines whether each potential parent is an actual parent of .
Theorem 2.
(DarmoisSkitovitch Theorem). Define two random variables and , as linear combinations of independent random variables :
(3) 
If and are statistically independent, then all variables for which are Gaussian.
In other words, if random variables are independent and for some and , is independent of , then for any that is nonGaussian, at most one of and can be nonzero.
Lemma 2.
Suppose that the data over variables are generated by (1) and that assumptions A1A3 hold. Assume there is no latent or observed confounder relative to and in the underlying true causal graph over all given variables, where is one of the potential parents of in the FCI output. Let be the residual of the regression of on . Then in the limit of infinite data, is an unconfounded ancestor of if and only if and .
Proof.
Without loss of generality, all these data are normalized to have zero mean and unit variance.
1. Assume that is an ancestor of and that is an exogenous variable, which means that there are no parent or latent confounders for and . and are generated by (1). This leads to
(4)  
where and are independent.
(1) The residual of regressing on will be
(5)  
Thus, the residual is independent of because is independent of .
(2) If instead we regress on , the residual will be
(6)  
Each parent of is a linear mixture of error terms including , where all the error terms are mutually independent and nonGaussian according to assumption A3. Thus, the residual is a mixture of , , and , where each is nonGaussian. From Equations (4) and (6), the coefficient of is nonzero, which implies that is dependent of according to Theorem 2. Thus, if is an ancestor of , then is dependent of and is dependent of .
2. Assume that and have at least one common ancestor. Let denote all parents of , and be an actual parent of . Then we have
(7) 
If we regress on , the residual will be
(8)  
Each parent of is a linear mixture of error terms other than , with all the error terms mutually independent and nonGaussian according to assumption A3. Thus, the residual can be written as a linear mixture of error terms including . We can see that the coefficient of in Equations (7) and (8) is nonzero due to , which implies that is dependent of according to Theorem 2. ∎
Lemma 2 provides a principle to determine the causal direction between a pair of measured variables. If there is no latent or observed confounder for and other variables, we can find the ancestors and children of . In detail, for each variable in , we regress on and test whether the residual is independent of . At the same time, we regress on and test the independence between the residual and . Then according to Lemma 2, we can determine whether is an ancestor or child of , or whether there is a confounder for them.
If we have determined some parents or children for measured variable , we can remove the common cause for two measured variables that are adjacent with the determined causal relationship by regression Shimizu et al. (2011), and then perform the step as above. This can determine most of the undetermined causal relations that are not influenced by confounders.
4.2.2 Identification of causal direction between variables not directly influenced by the same confounder
After identifying the unconfounded ancestor, some cases where the causal structure between the measured variables cannot be identified because of the indirect latent confounders. They contain two cases:

The parent and children of the measured variable are directly influenced by the same latent confounder , while is not adjacent to (or equivalently, not directly influenced by) ;

Two or more parents of the measured variable are influenced by the same latent confounder , while is not adjacent to .
Case 1: For the first case, and using Figure 4(a) as an example, and are directly influenced by the hidden common cause , but is not. The PAG obtained by Stage I is shown in 4(b). Then for any of the three pairs of the three variables , , and
, regression is performed, and the independence of the residuals and the predictor variable is tested. But we can only determine
, and cannot identify . If we can remove the indirect cause of , then can be determined. After determining that , we regress on and replace with its corresponding residual . We can find that if the causal relationship between and also satisfies model (1), we can use Lemma 2 to determine . Next, we generalize Lemma 2 proposed by Shimizu et al. Shimizu et al. (2011) to the latent confounder case and call it Lemma 3.Lemma 3.
Assume that the data over measured variables follows Model (1). Let denote a set of all found parents of () and be the result of replacing each with its residual from regressing on . Then, an analog of Model (1) holds as follows: , where is a matrix of causal strengths among the residuals that corresponds to the measured variables, is a matrix of causal influences of the latent confounders on measured variables, and the noise terms in are mutually independent and nonGaussian.
Proof.
Without loss of generality, we assume that in Equation 2 can be permuted to a strictly lower triangular matrix. Therefore, of Equation 2 is also a lower triangular matrix with diagonal entries. Since is the parent of for each ,
is equal to the regression coefficient obtained by linear regression of
on . Therefore, through linear regression, the causal effect of on is removed from , that is, each in is 0, and does not influence the residual . Therefore, for , its corresponding is still a strictly lower triangular matrix, (i.e., is also a strictly lower triangular matrix). Therefore, holds. ∎Thus, for each variable , after removing the effect of all determined parents of by regressions and independence tests we can find the parents and children of . The details of the procedure are as follows.
First, for each pair of measured variables and , we perform a linear regression of on , and test whether the corresponding residual is independent of . If it is, we orient . Otherwise, we test whether the reverse causal direction is accepted. If neither of them is accepted, there may be at least one latent confounder or a common ancestor influencing them. After refining some edges, we remove the effects of parents by regressing the variable on its determined parents and using the corresponding residuals to replace the variables. This is because if and are unconfounded, then after we remove the information in and that can be explained by their common ancestors, the residuals in and are unconfounded and they admit the same causal direction as that between and . Then, we iterate the first step for the variables with an undetermined edge between them to determine more edges, until no independence between a potential parent of variable and the corresponding residual is accepted.
Case 2: We then consider the second case for and its parents and ; we still cannot determine the causal relationship between and or between and . That is to say, and are mediating variables for the path from and so that is a common cause of and , and of and . Using to “block” this path will remove the influence from to . This inspired us to apply regressions to address the problem, with the following theorem confirming its correctness.
Lemma 4.
Suppose that the data over variables were generated by Equation 1 and assumptions hold. Let denote a set of measured variables that are potential parents of and . Let be the residual of regressing on . In the limit of infinite data, is an unconfounded parent of , if and only if there exist a subset , defined above such that is independent of .
Proof.
Note that denote all potential parents of and be a subset of . If variable is in , then might be a (confounded) parent or child of , or there is a latent confounder between and without directed edge.
1. Consider that is a parent of and there is no latent confounder between and . First, we can rewrite Equation 1 as
where . The inverse of can be written as
(9) 
where . Thus, .
Then, regressing on , we have
where .
Thus, the residual will be a linear mixture of latent confounders, the noise terms of and all variables in . If the linear contributions of all variables in to the influence of on have been partialed out, that is, , then we can obtain
(10)  
Because there is no latent confounder between and , the coefficient of on is zero. Thus, from Equation 10, is independent of due to Theorem 2.
2. Consider that is a confounded parent or confounded child of , or that there is a latent confounder between them without directed edge. The effect of the latent confounder may not vanish by multiple regression on any measured variables. So the residual of regressing on () is dependent of .
3. Consider that is a child of and there is no latent confounder between and . If we regress on every which contains , the residual will be a linear mixture of the noise term of and others. According to the Equation 1, is a linear mixture of the noise term of , and others. Thus, is dependent of . ∎
Lemma 4 inspires a method of identifying the local structure of measured variables for the second case by analyzing the PAG. According to Lemma 4, we start by performing a multiple regression of undetermined variable on every subset of its potential parents to test whether there exist two variables and such that the corresponding residual is independent of these two variables. If the independence holds for variable and the residual, then is a parent of . Similarly, if undetermined edges remain, we perform a multiple regression on the subset of the potential parents containing three variables and then four variables, and so on, to find variables in the subset of potential parents that are unconfounded parents (according to independence tests) until no subset such that the residual is independent of the predictor(s) can be found.
Using these methods, we find local causal structures over measured variables that are adjacent to an undetermined edge in . In this stage, when an edge is reoriented, we apply FCI orientation rules Zhang (2008) to orient other undetermined edges and update the corresponding potential parent sets. Using these orientation rules saves a number of regressions and independence tests.
As an example, using the causal graph from Figure 3(b), we obtain the output by (multiple) regressions and independence tests. By applying the FCI orientation rule Zhang (2008), we reorient the edge between and according to assumption A3. The final graph produced by this stage is shown in Figure 3(c).
According to the stage II process, the following theorem summarizes identifiability.
Theorem 3.
Suppose that the data over variables was generated by model (1) and assumptions hold. Let denote the output of stage I. The pairs of variables with an undirected edge in between in that are not actually directly influenced by the same latent confounder are identified by stage II of FRITL.
Proof.
Under the assumptions of the theorem, stage I removes most of the independent causal edges, which provides stage II with (conditional) independence information. With the help of Lemmas 2 and 3, we can determine the direction of the causal relationship between variables that are not directly influenced by the same latent confounder. Lemma 4 provides the identifiability conditions of the causal structure between observed variables that are not influenced by the same latent confounder. ∎
As a consequence, what remains to be identified is the causal structure between variables directly influenced by the same latent confounders.
4.3 Stage III: Detecting Shared Latent Confounders Using the Triad Condition
The procedure so far determines whether latent confounders exist in many cases, but some graphs corresponding to the PAG shown in Figure 5 remain undistinguishable. Stage II only considers two variables each time. Thus, there are no details about the causal relationship (e.g., whether there is a direct causal relation and which way the causal influence goes) between two variables that are directly influenced by a same latent confounder, because these two variables both contain the information of the latent confounder. Suppose that assumptions A3 hold. Interestingly, if we consider another measured variable at the same time, we can treat this third variable as a “conditional variable” or an “instrumental variable” and use it to help remove the indirect causal relationship (due to the existence of latent confounders) through the path containing latent confounders. The Triad condition Cai et al. (2019), which the proposed procedure makes use of, is described as follows.
Definition 2.
(Triad condition) Suppose assumptions A1  A3 hold. For a triple of measured variables generated by (1). and are Triad conditional on (or given ), when the residual is independent of , that is, . If the Triad condition is satisfied, we denote it by .
It is easy to establish the property that the Triad condition is symmetric, that is, if and only if .
The three possible causal graphs given in Figures 5 (a)(c) over three measured variables correspond to the PAG in Figure 5 (d) that is produced by stage I. None of the three undirected edges can be reoriented by stage II. Based on the Triad condition, We detect whether three variables share a latent confounder via the following Theorem 4.
Theorem 4.
Suppose that the data over variables was generated according to Equation 1 and assumptions hold. Let denote the output of stage II of FRITL. For three observed variables with an undetermined edge between each pair of them in , if and only if three Triad conditions hold among , then are directly influenced by a same latent confounder, and each pair of observed variables are not directly connected.
Proof.
Suppose that the data over variables was generated by Equation 1. Without loss of generality, we assume three variables in , , and , are standardized (they have a zero mean and a unit variance) and have causal relations in between, in addition to the influences of latent confounders. Note that if a coefficient is zero, then the corresponding edge vanishes. Then we have
(11)  
Three kinds of Triad conditions might hold among three variables: , and . So we consider three cases conditioning on different variables as follows.
1. Considering a Triad condition conditioning on , we can obtain the following reference variable
which is a linear mixture of independent variables, namely, , , , and . As we know, is a mixture of independent variables and . If the parameters in this model are not zero, it is dependent on because of Theorem 2. Next, if it satisfies , i.e., is independent of . According to Theorem 2, at most one of the coefficients of their common parameters, and should be zero. Therefore, and should be equal to zero, i.e., , because , and are nonzero. Then, becomes a linear mixture of , and and is independent of . Thus, there are no edges between and , and between and .
2. Consider a Triad condition conditioning on , we can obtain the following reference variable
which is a linear mixture of four independent variables, namely, , , , and . We can see that
(12) 
which is a mixture of three independent variables , and . If all parameters in this model are nonzero, is dependent of because of the Theorem 2.
If all three variables are directly influenced by the same latent confounder, satisfies , i.e., is independent of . According to Theorem 2, at most one of the coefficients on their common parameters, , and , should be zero. Therefore, would be zero, and then we can see that would be zero, too. Then, becomes a linear mixture of and , and is independent of . This also shows that the graph in which there is at most one directed edge between two measured variables and one latent confounder influences them at the same time are distinguishable by the Triad condition.
3. Consider a Triad conditioning on , similar with the two cases above, we can know that if there is only one edge between and , i.e., , then the graph implies Triad condition .
In conclusion, if there is not a directed edge between any pair of measured variables, that is, , then the corresponding causal graph implies three Triad conditions, which are , , . According to assumption A3, means that there is no direct edge between observed variables and . If at least two causal strengths in are zero, then the causal structure over
Comments
There are no comments yet.