FRITL: A Hybrid Method for Causal Discovery in the Presence of Latent Confounders

03/26/2021
by   Wei Chen, et al.
Carnegie Mellon University
0

We consider the problem of estimating a particular type of linear non-Gaussian model. Without resorting to the overcomplete Independent Component Analysis (ICA), we show that under some mild assumptions, the model is uniquely identified by a hybrid method. Our method leverages the advantages of constraint-based methods and independent noise-based methods to handle both confounded and unconfounded situations. The first step of our method uses the FCI procedure, which allows confounders and is able to produce asymptotically correct results. The results, unfortunately, usually determine very few unconfounded direct causal relations, because whenever it is possible to have a confounder, it will indicate it. The second step of our procedure finds the unconfounded causal edges between observed variables among only those adjacent pairs informed by the FCI results. By making use of the so-called Triad condition, the third step is able to find confounders and their causal relations with other variables. Afterward, we apply ICA on a notably smaller set of graphs to identify remaining causal relationships if needed. Extensive experiments on simulated data and real-world data validate the correctness and effectiveness of the proposed method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 22

08/11/2019

Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables

We consider the problem of learning causal models from observational dat...
01/13/2020

Causal discovery of linear non-Gaussian acyclic models in the presence of latent confounders

Causal discovery from data affected by latent confounders is an importan...
10/18/2018

An Upper Bound for Random Measurement Error in Causal Discovery

Causal discovery algorithms infer causal relations from data based on se...
01/17/2021

Disentangling Observed Causal Effects from Latent Confounders using Method of Moments

Discovering the complete set of causal relations among a group of variab...
02/16/2021

The DeCAMFounder: Non-Linear Causal Discovery in the Presence of Hidden Variables

Many real-world decision-making tasks require learning casual relationsh...
06/10/2017

Causal Discovery in the Presence of Measurement Error: Identifiability Conditions

Measurement error in the observed values of the variables can greatly ch...
06/04/2018

groupICA: Independent component analysis for grouped data

We introduce groupICA, a novel independent component analysis (ICA) algo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Causal discovery is crucial for understanding the actual mechanism underlying events in fields such as neuroscience Sanchez-Romero et al. (2019), biology Sachs et al. (2005) and social networks Cai et al. (2016). In such areas, the aim of the inquiry is to discover causal relations among variables that are measured only indirectly. Unmeasured variables and their influence on measured variables are unknown prior to the inquiry. Various methods for discovering the causal structure from observed samples have been proposed. However, most of them assume that the system of variables is causal sufficient, which means no pairs of variables have an unmeasured common cause (also called a latent confounder) Spirtes et al. (2001). Real applications typically violate this assumption. For example, some variables might not be measured because of limitations in data collection, and other variables may not even be considered in the data collection design. Without considering the presence of latent confounders, these algorithms return some false causal relations. Thus, developing a causal discovery method in the presence of latent confounders is an important research topic.

Methods for finding latent confounders and their relationships began early in the 20th century in factor analysis and its applications. In the case of continuous variables, linear relationships among variables are widely used as the data-generation assumption in searches for structural equation models (SEMs). Recently SEMs have begun to employ non-Gaussian additive (unmeasured) disturbances for each variable. The LvLiNGAM Hoyer et al. (2008) algorithm, which uses overcomplete Independent Components Analysis (ICA) Eriksson and Koivunen (2004) Lewicki and Sejnowski (2000)

, has been proposed to estimate the causal relations among measured variables in systems with linearly related variables. Given the number of latent confounders and appropriate data, it can in principle identify the measured variables sharing a common cause or causes, as well as the causal relations between measured variables, but it requires latent confounders to be mutually independent. This independence is impractical when the number of variables is large. The algorithm easily falls into local optima, which produces estimation errors aggravated by high-dimensional data. The ParceLiNGAM

Tashiro et al. (2014) and PairwiseLvLiNGAM Entner and Hoyer (2010) methods have been proposed for the same model class, but these methods fail to identify the causal structure given in Fig. 1. Existing independence noise-based methods have a high computational load and do not fully identify the causal structure.

Figure 1: An example of a causal graph, where , and are observed variables, and is a latent confounder.

Constraint-based methods such as the Fast Causal Inference (FCI) algorithm Spirtes et al. (2001) is another type of methods for recovering causal structures. Although the results of the FCI algorithm are statistically consistent but provide limited information. For example, even when no confounders exist, FCI usually provides too few directed, unconfounded causal relationships; on the other hand, for a small number of variable pairs, hidden variables usually can not be found. As a specific example, consider the data generated according to the Directed Acyclic Graph (DAG) shown in Figure 2(a). The FCI output, called Partial Ancestral Graph (PAG), is given as Figure 2(b). The adjacency and arrowheads in Figure 2(b) are mostly correct, but some undetermined tails of edges remain.

From these observations, we propose a hybrid method assuming linearity and non-Gaussianity, to take advantages of both constraint-based methods and independent noise-based methods to handle both confounded and unconfounded situations. However, designing such a solution is a non-trivial task due to the two specific challenges raised by the high dimensionality of the measured variables and the latent confounders. One is how to efficiently decompose a large global graph into local small structures without introducing new latent confounders. The second is how to recover local structures accurately in the presence of latent confounders. To address these challenges, we first employ FCI to remove some independent causal relationships. This output will not be complete, in the sense that it contains many undetermined causal edges when latent confounders might not exist. We further refine this output to examine unconfounded causal edges and locate the latent confounders by applying an independent noise-based method among only those adjacent pairs informed by the FCI result. The Triad condition Cai et al. (2019) identifies some shared latent confounders and the causal relations between measured variables. If some causal directions are still undetermined, we apply overcomplete ICA locally to refine the causal structure.

Figure 2: An example for graphs of observed variables and latent confounders , and : (a) the original directed acyclic graph (DAG), and (b) the corresponding PAG produced by FCI.

We summarize our contributions as follows:

  • We propose a hybrid framework to reconstruct the entire causal structure from measured data, handling both confounded and unconfounded situations.

  • We show the completeness result of our proposed method, demonstrating the correctness of our method on the theoretical side;

  • We verify the correctness and effectiveness of our method on simulated and real-world data, showing results to be mostly consistent with the background knowledge.

2 Graphical Models

We employ two types of graphical representations of causal relations: Directed Acyclic Graphs (DAGs) and Partial Ancestral Graphs (PAGs).

2.1 DAG Description

A DAG can be used to represent both causal and independence relationships. A DAG contains a set of vertices and a set of directed edges (

), where each vertex represents one random variable.

means that is a “direct” cause (or parent) of , that is, is a direct effect (or child) of . Figure 2(a) shows an example of a DAG . In Figure 2(a), is a parent of , or is a child of , due to the edge . Two vertices are adjacent if there is a directed edge or . A directed path from to is a sequence of vertices beginning with and ending with such that each vertex in the sequence is a child of its predecessor in the sequence. Any sequence of vertices in which each vertex is adjacent to its predecessor is an undirected path. A vertex, in a path is a collider if is a child of both its predecessor and its successor in the path.

d-separation Pearl (1988). Let be a set of variables in DAG that does not have either of and as members. and are d-separated given if and only if there exists no undirected path between and , such that both of the following conditions hold:

(i) every collider on has a descendent (or itself) in ;

(ii) no variable on U that is not a collider is in .

Two variables that are not d-separated by are said to be d-connected given .

2.2 PAG Description

A PAG contains four different types of edges between two variables: directed edge (), bidirected edge (), partially directed edge (), and nondirected edge (). A directed edge means that is a cause of . A bidirected edge indicates that there is a latent confounder that is a common cause of and . A partially directed edge indicates that either is a cause of , or there is an unmeasured variable influencing and , or both. A nondirected edge means exactly one of the following holds: (a) is a cause of ; (b) is a cause of ; (c) there is an unmeasured variable influencing and ; (d) both a and c; or (e) both b and c. In a PAG, the end marks of some edges may be undetermined, i.e., the undetermined edge is an edge other than the directed edge.

Figure 2(b) shows a PAG representing the set of all DAGs that imply the same conditional independence relations among the measured variables as does the DAG (Figure 2(a)). For example, the bidirected edge between and means that there is a latent confounder influencing and . The non-directed edge between and shows a class of causal relation between and , that is, this edge might be: , , .

A PAG can be estimated by the FCI algorithm Spirtes et al. (2001) or one of its variants such as RFCI Colombo et al. (2012), FCI Claassen et al. (2013), FCI-stable Colombo and Maathuis (2014), Conservative FCI (CFCI) Ramsey et al. (2006) or Greedy FCI (GFCI) Ogarrio et al. (2016).

3 Problem Definition

To help with the definition of the scope of our solution, we assume that all samples are infinite, independent distributed, following the same joint probability distribution

. Further, we make some or all of the following assumptions according to context.

. Causal Markov Assumption. Two variables and are independent given a subset of variables not containing and , if and are d-separated given .

. Causal Faithfulness Assumption. Let . If and are independent conditional on in , then is d-separated from conditional on in .

We assume the target to be discovered is a DAG, represented as a linear non-Gaussian model with latent confounders (named as LvLiNGAM), as defined by Hoyer et al. Hoyer et al. (2008), in which each measured variable in , , is generated from its parents including measured variables and latent confounders with an additive noise term. The matrix form of LvLiNGAM then can be formalized as

(1)

where is the matrix of causal strengths among measured variables, is the matrix of causal influences of the latent confounders on measured variables, and the noise terms, as components of , are mutually independent and non-Gaussian. According to lvLiNGAM by Hoyer et al. Hoyer et al. (2008), we know that can be permutated to be a lower triangular matrix, and Equation 1 can be changed to

(2)

where and

denotes the identity matrix.

Based on Equation 1, we make the following further assumptions:

. Linear Acyclic Non-Gaussianity Assumption. The causal graph over all variables, including the latent variables, is a directed acyclic graph (DAG), which represents the model in which the causal relations among any variables are linear and all noise terms are non-Gaussian and mutually independent.

. One Latent Confounder Assumption. All latent confounders are independent of each other, and each pair of observed variables is directly influenced by at most one latent confounder.

Based on the above assumptions, we define our problem as follows.

Definition 1.

(Problem Definition) Given the observational data generated by causal model as Equation 1, reconstruct the causal graph over measured variables and latent confounders.

4 A Hybrid Method for Causal Discovery in the Presence of Latent Confounders

In this section, we describe our approach in detail, and explaining how it recover the true graph shown in Figure 3(a) that represents causal model (1). The proposed framework is given in Figure 3.

Figure 3: The proposed framework. In these graphs, represent the measured variables and and represent the latent confounders. The red line in (c) means the edge can be updated by FCI orientation rule . Blue lines in (c) and (d) indicate the edges for which end marks are not determined.

The idea is as follows. After running the FCI algorithm to obtain a PAG, we further try to orient edges by regression and subsequent independence testing, extrapolating directions by the well-known Meek rules. From regression residuals, we further determine local causal structures for pairs of variables that are adjacent with an undetermined edge in the PAG. We then introduce a constraint condition for triples of variables to detect and combine some latent confounders. Finally, under the further assumption that the latent confounders are independent, we use overcomplete ICA to determine the remaining edges when needed. The pseudo-code of this framework (named FRITL) is described in Algorithm 1. The use of these four steps can be selected according to the purpose.

Input: Data , threshold for independence test
Output: Causal graph over measured variables and latent confounders

  Stage I: Construct a PAG by running the FCI algorithm on ;
  Stage II: Infer local causal structures by using an independence noise condition for undetermined adjacent pairs of variables in ; update to ;
  Stage III: Detect shared latent confounders by using Triad conditions and update to ;
  Stage IV: Estimate remaining undetermined local causal structures in using overcomplete ICA. Update to .
Algorithm 1 FRITL Algorithm

4.1 Stage I: Constructing PAG Using FCI

We begin by supposing that the data generated by causal model (1) satisfies assumptions A1-A2. The FCI algorithm outputs a PAG, which represents estimated features of the true causal DAG according to the following theorem Spirtes et al. (2001) and lemma.

Theorem 1.

Given the assumptions A1-A2, the FCI algorithm outputs a PAG that represents a class of graphs including the true causal DAG.

Lemma 1.

Given the assumptions A1-A2, if FCI converges to a PAG with a directed edge between and , then there is a directed edge between and in the true DAG.

We first apply FCI on the data to obtain a PAG. For example, using the graph representing a lvLiNGAM model in Figure 3(a), the FCI algorithm outputs the PAG shown in Figure 3(b).

4.2 Stage II: Inferring Local Structures Using Independence Noise Condition

After running stage I, we obtain the PAG (given in Figure 3(b)) that is (asymptotically) correct information of the causal structure but usually provides few direct influences. Although we can apply overcomplete ICA to estimate the true causal graph, the result may suffer from local optima, especially if the number of measured variables is larger than four. In contrast, the “divide-and-conquer” provides more causal information about the undetermined edges in the graph and only requires performing overcomplete ICA on a small number of variables to estimate the local causal structures. This second stage produces correct, informative causal discovery result with relatively low computational complexity. We note that in the linear non-Gaussian case, unconfounded causal relations can always be determined by regression and independence testing Shimizu et al. (2011). Inspired by this, we consider generalizing regression and independence test from global causal structure to local causal structure, even when there are latent confounders.

4.2.1 Identification of causal direction between unconfounded pairs of variables

We first provide a lemma to identify the causal direction of variables that are not influenced by confounders. Let denote the PAG obtained by FCI. From the definition of a PAG, variables connected to the measured variable through a directed, nondirected, or partially directed edge are the potential parents of . For example, is a potential parent of if , , or . Let denote the potential parents of in .

If there are no latent or observed confounders of and any of , we can generalize Lemma 1 proposed by Shimizu et al. Shimizu et al. (2011) to determine local causal structures. We first introduce the Darmois-Skitovitch Theorem Darmois (1953)Skitovitch (1953), which determines whether each potential parent is an actual parent of .

Theorem 2.

(Darmois-Skitovitch Theorem). Define two random variables and , as linear combinations of independent random variables :

(3)

If and are statistically independent, then all variables for which are Gaussian.

In other words, if random variables are independent and for some and , is independent of , then for any that is non-Gaussian, at most one of and can be nonzero.

Lemma 2.

Suppose that the data over variables are generated by (1) and that assumptions A1-A3 hold. Assume there is no latent or observed confounder relative to and in the underlying true causal graph over all given variables, where is one of the potential parents of in the FCI output. Let be the residual of the regression of on . Then in the limit of infinite data, is an unconfounded ancestor of if and only if and .

Proof.

Without loss of generality, all these data are normalized to have zero mean and unit variance.

1. Assume that is an ancestor of and that is an exogenous variable, which means that there are no parent or latent confounders for and . and are generated by (1). This leads to

(4)

where and are independent.

(1) The residual of regressing on will be

(5)

Thus, the residual is independent of because is independent of .

(2) If instead we regress on , the residual will be

(6)

Each parent of is a linear mixture of error terms including , where all the error terms are mutually independent and non-Gaussian according to assumption A3. Thus, the residual is a mixture of , , and , where each is non-Gaussian. From Equations (4) and (6), the coefficient of is non-zero, which implies that is dependent of according to Theorem 2. Thus, if is an ancestor of , then is dependent of and is dependent of .

2. Assume that and have at least one common ancestor. Let denote all parents of , and be an actual parent of . Then we have

(7)

If we regress on , the residual will be

(8)

Each parent of is a linear mixture of error terms other than , with all the error terms mutually independent and non-Gaussian according to assumption A3. Thus, the residual can be written as a linear mixture of error terms including . We can see that the coefficient of in Equations (7) and (8) is nonzero due to , which implies that is dependent of according to Theorem 2. ∎

Lemma 2 provides a principle to determine the causal direction between a pair of measured variables. If there is no latent or observed confounder for and other variables, we can find the ancestors and children of . In detail, for each variable in , we regress on and test whether the residual is independent of . At the same time, we regress on and test the independence between the residual and . Then according to Lemma 2, we can determine whether is an ancestor or child of , or whether there is a confounder for them.

If we have determined some parents or children for measured variable , we can remove the common cause for two measured variables that are adjacent with the determined causal relationship by regression Shimizu et al. (2011), and then perform the step as above. This can determine most of the undetermined causal relations that are not influenced by confounders.

4.2.2 Identification of causal direction between variables not directly influenced by the same confounder

After identifying the unconfounded ancestor, some cases where the causal structure between the measured variables cannot be identified because of the indirect latent confounders. They contain two cases:

  1. The parent and children of the measured variable are directly influenced by the same latent confounder , while is not adjacent to (or equivalently, not directly influenced by) ;

  2. Two or more parents of the measured variable are influenced by the same latent confounder , while is not adjacent to .

Case 1: For the first case, and using Figure 4(a) as an example, and are directly influenced by the hidden common cause , but is not. The PAG obtained by Stage I is shown in 4(b). Then for any of the three pairs of the three variables , , and

, regression is performed, and the independence of the residuals and the predictor variable is tested. But we can only determine

, and cannot identify . If we can remove the indirect cause of , then can be determined. After determining that , we regress on and replace with its corresponding residual . We can find that if the causal relationship between and also satisfies model (1), we can use Lemma 2 to determine . Next, we generalize Lemma 2 proposed by Shimizu et al. Shimizu et al. (2011) to the latent confounder case and call it Lemma 3.

Figure 4: An example that cannot be identified by Lemma 2: (a) a true causal graph; (b) the PAG estimated by Stage I using the data generated according to (a). In these graphs, , and represent the measured variables while represents a latent confounder.
Lemma 3.

Assume that the data over measured variables follows Model (1). Let denote a set of all found parents of () and be the result of replacing each with its residual from regressing on . Then, an analog of Model (1) holds as follows: , where is a matrix of causal strengths among the residuals that corresponds to the measured variables, is a matrix of causal influences of the latent confounders on measured variables, and the noise terms in are mutually independent and non-Gaussian.

Proof.

Without loss of generality, we assume that in Equation 2 can be permuted to a strictly lower triangular matrix. Therefore, of Equation 2 is also a lower triangular matrix with diagonal entries. Since is the parent of for each ,

is equal to the regression coefficient obtained by linear regression of

on . Therefore, through linear regression, the causal effect of on is removed from , that is, each in is 0, and does not influence the residual . Therefore, for , its corresponding is still a strictly lower triangular matrix, (i.e., is also a strictly lower triangular matrix). Therefore, holds. ∎

Thus, for each variable , after removing the effect of all determined parents of by regressions and independence tests we can find the parents and children of . The details of the procedure are as follows.

First, for each pair of measured variables and , we perform a linear regression of on , and test whether the corresponding residual is independent of . If it is, we orient . Otherwise, we test whether the reverse causal direction is accepted. If neither of them is accepted, there may be at least one latent confounder or a common ancestor influencing them. After refining some edges, we remove the effects of parents by regressing the variable on its determined parents and using the corresponding residuals to replace the variables. This is because if and are unconfounded, then after we remove the information in and that can be explained by their common ancestors, the residuals in and are unconfounded and they admit the same causal direction as that between and . Then, we iterate the first step for the variables with an undetermined edge between them to determine more edges, until no independence between a potential parent of variable and the corresponding residual is accepted.

Case 2: We then consider the second case for and its parents and ; we still cannot determine the causal relationship between and or between and . That is to say, and are mediating variables for the path from and so that is a common cause of and , and of and . Using to “block” this path will remove the influence from to . This inspired us to apply regressions to address the problem, with the following theorem confirming its correctness.

Lemma 4.

Suppose that the data over variables were generated by Equation 1 and assumptions hold. Let denote a set of measured variables that are potential parents of and . Let be the residual of regressing on . In the limit of infinite data, is an unconfounded parent of , if and only if there exist a subset , defined above such that is independent of .

Proof.

Note that denote all potential parents of and be a subset of . If variable is in , then might be a (confounded) parent or child of , or there is a latent confounder between and without directed edge.

1. Consider that is a parent of and there is no latent confounder between and . First, we can rewrite Equation 1 as

where . The inverse of can be written as

(9)

where . Thus, .

Then, regressing on , we have

where .

Thus, the residual will be a linear mixture of latent confounders, the noise terms of and all variables in . If the linear contributions of all variables in to the influence of on have been partialed out, that is, , then we can obtain

(10)

Because there is no latent confounder between and , the coefficient of on is zero. Thus, from Equation 10, is independent of due to Theorem 2.

2. Consider that is a confounded parent or confounded child of , or that there is a latent confounder between them without directed edge. The effect of the latent confounder may not vanish by multiple regression on any measured variables. So the residual of regressing on () is dependent of .

3. Consider that is a child of and there is no latent confounder between and . If we regress on every which contains , the residual will be a linear mixture of the noise term of and others. According to the Equation 1, is a linear mixture of the noise term of , and others. Thus, is dependent of . ∎

Lemma 4 inspires a method of identifying the local structure of measured variables for the second case by analyzing the PAG. According to Lemma 4, we start by performing a multiple regression of undetermined variable on every subset of its potential parents to test whether there exist two variables and such that the corresponding residual is independent of these two variables. If the independence holds for variable and the residual, then is a parent of . Similarly, if undetermined edges remain, we perform a multiple regression on the subset of the potential parents containing three variables and then four variables, and so on, to find variables in the subset of potential parents that are unconfounded parents (according to independence tests) until no subset such that the residual is independent of the predictor(s) can be found.

Using these methods, we find local causal structures over measured variables that are adjacent to an undetermined edge in . In this stage, when an edge is reoriented, we apply FCI orientation rules Zhang (2008) to orient other undetermined edges and update the corresponding potential parent sets. Using these orientation rules saves a number of regressions and independence tests.

As an example, using the causal graph from Figure 3(b), we obtain the output by (multiple) regressions and independence tests. By applying the FCI orientation rule Zhang (2008), we reorient the edge between and according to assumption A3. The final graph produced by this stage is shown in Figure 3(c).

According to the stage II process, the following theorem summarizes identifiability.

Theorem 3.

Suppose that the data over variables was generated by model (1) and assumptions hold. Let denote the output of stage I. The pairs of variables with an undirected edge in between in that are not actually directly influenced by the same latent confounder are identified by stage II of FRITL.

Proof.

Under the assumptions of the theorem, stage I removes most of the independent causal edges, which provides stage II with (conditional) independence information. With the help of Lemmas 2 and 3, we can determine the direction of the causal relationship between variables that are not directly influenced by the same latent confounder. Lemma 4 provides the identifiability conditions of the causal structure between observed variables that are not influenced by the same latent confounder. ∎

As a consequence, what remains to be identified is the causal structure between variables directly influenced by the same latent confounders.

4.3 Stage III: Detecting Shared Latent Confounders Using the Triad Condition

Figure 5: The three causal graphs in (a), (b), and (c) all correspond to the completed nondirected PAG obtained by FCI shown in (d). They cannot be distinguished by stage II of our method.

The procedure so far determines whether latent confounders exist in many cases, but some graphs corresponding to the PAG shown in Figure 5 remain undistinguishable. Stage II only considers two variables each time. Thus, there are no details about the causal relationship (e.g., whether there is a direct causal relation and which way the causal influence goes) between two variables that are directly influenced by a same latent confounder, because these two variables both contain the information of the latent confounder. Suppose that assumptions A3 hold. Interestingly, if we consider another measured variable at the same time, we can treat this third variable as a “conditional variable” or an “instrumental variable” and use it to help remove the indirect causal relationship (due to the existence of latent confounders) through the path containing latent confounders. The Triad condition Cai et al. (2019), which the proposed procedure makes use of, is described as follows.

Definition 2.

(Triad condition) Suppose assumptions A1 - A3 hold. For a triple of measured variables generated by (1). and are Triad conditional on (or given ), when the residual is independent of , that is, . If the Triad condition is satisfied, we denote it by .

It is easy to establish the property that the Triad condition is symmetric, that is, if and only if .

The three possible causal graphs given in Figures 5 (a)-(c) over three measured variables correspond to the PAG in Figure 5 (d) that is produced by stage I. None of the three undirected edges can be reoriented by stage II. Based on the Triad condition, We detect whether three variables share a latent confounder via the following Theorem 4.

Theorem 4.

Suppose that the data over variables was generated according to Equation 1 and assumptions hold. Let denote the output of stage II of FRITL. For three observed variables with an undetermined edge between each pair of them in , if and only if three Triad conditions hold among , then are directly influenced by a same latent confounder, and each pair of observed variables are not directly connected.

Proof.

Suppose that the data over variables was generated by Equation 1. Without loss of generality, we assume three variables in , , and , are standardized (they have a zero mean and a unit variance) and have causal relations in between, in addition to the influences of latent confounders. Note that if a coefficient is zero, then the corresponding edge vanishes. Then we have

(11)

Three kinds of Triad conditions might hold among three variables: , and . So we consider three cases conditioning on different variables as follows.

1. Considering a Triad condition conditioning on , we can obtain the following reference variable

which is a linear mixture of independent variables, namely, , , , and . As we know, is a mixture of independent variables and . If the parameters in this model are not zero, it is dependent on because of Theorem 2. Next, if it satisfies , i.e., is independent of . According to Theorem 2, at most one of the coefficients of their common parameters, and should be zero. Therefore, and should be equal to zero, i.e., , because , and are nonzero. Then, becomes a linear mixture of , and and is independent of . Thus, there are no edges between and , and between and .

2. Consider a Triad condition conditioning on , we can obtain the following reference variable

which is a linear mixture of four independent variables, namely, , , , and . We can see that

(12)

which is a mixture of three independent variables , and . If all parameters in this model are non-zero, is dependent of because of the Theorem 2.

If all three variables are directly influenced by the same latent confounder, satisfies , i.e., is independent of . According to Theorem 2, at most one of the coefficients on their common parameters, , and , should be zero. Therefore, would be zero, and then we can see that would be zero, too. Then, becomes a linear mixture of and , and is independent of . This also shows that the graph in which there is at most one directed edge between two measured variables and one latent confounder influences them at the same time are distinguishable by the Triad condition.

3. Consider a Triad conditioning on , similar with the two cases above, we can know that if there is only one edge between and , i.e., , then the graph implies Triad condition .

In conclusion, if there is not a directed edge between any pair of measured variables, that is, , then the corresponding causal graph implies three Triad conditions, which are , , . According to assumption A3, means that there is no direct edge between observed variables and . If at least two causal strengths in are zero, then the causal structure over