1 Introduction
Many researchers, particularly in economics, psychology, epidemiology, and the social sciences, use linear structural equation models (SEMs) to describe the causal and statistical relationships between a set of variables, predict the effects of interventions and policies, and to estimate parameters of interest
[bollen:pea393]. A linear SEM consists of a set of equations of the form,where
is a vector containing the model variables,
is a matrix containing the coefficients of the model, which convey the strength of the causal relationships, andis a vector of error terms, which represents omitted or latent variables and are assumed to be normally distributed. The matrix
contains zeroes on the diagonal, and whenever is not a cause of . The covariance matrix of will be denoted by and the covariance matrix over the error terms, U, by . The entries of and are the model parameters. In this paper, we will restrict our attention to semiMarkovian models [pearl:2k], models where the rows of can be arranged so that it is lower triangular.When the coefficients are known, then total effects, direct effects, and counterfactuals can be computed from them directly [pearl:09, chen:pea14]. However, in order to be able to compute these coefficients, we must utilize domain knowledge in the form of exclusion and independence restrictions [pearl:95, p. 704]. Exclusion restrictions represent assumptions that a given variable is not a direct cause of another, while independence restrictions represent assumptions that no latent confounders exists between two variables. Algebraically, these assumptions translate into restrictions on entries in the coefficient matrix, , and error term covariance matrix, , to zero.
Determining whether model parameters can be expressed in terms of the probability distribution, which is necessary to be able to estimate them from data, is the problem of
identification. When it is not possible to uniquely express the value of a model parameter in terms of the probability distribution, we will say that the parameter is not identifiable.^{1}^{1}1We will also use the term “identifiable” with respect to the model as a whole. When the model contains an unidentified coefficient, the model is not identified. In linear systems, this generally takes the form of expressing a parameter in terms of the covariance matrix over the observable variables.To our knowledge, the most general method for determining model identification is the halftrek criterion [foygel:12]. Identifying individual structural coefficients can be accomplished using the singledoor criterion (i.e. identification using regression) [pearl:09, chen:pea14], instrumental variables [wright:25, wright:28] (see [brito:pea02a], [pearl:09], or [chen:pea14] for a graphical characterization), instrumental sets [brito:pea02a], and the general halftrek criterion [chen:15], which generalizes the halftrek criterion for individual coefficients rather than entire models. Finally, dseparation [pearl:09] and overidentification [pearl:04, chen:etal14] provide the means to enumerate testable implications of the model, which can be used to test it against data.
Each of these methods only utilize restrictions on the entries of and to zero. In this paper, we introduce auxiliary variables, which can be used to incorporate knowledge of nonzero coefficient values into existing methods of identification and model testing. The intuition behind auxiliary variables is simple: if the coefficient from variable to is known, then we would like to remove the direct effect of on by subtracting it from . We do this by creating a variable and using it as a proxy for . In some cases, may allow the identification of parameters or testable implications using the aforementioned methods when could not.
While intuitively simple, auxiliary variables are able to greatly increase the power of existing identification methods, even without external knowledge of coefficient values. We propose a bootstrapping procedure whereby coefficients are iteratively identified using simple instrumental sets and then used to generate auxiliary variables, which enable the identification of previously unidentifiable coefficients. We prove that this method enhances the instrumental set method to the extent that it is able to subsume the relatively more complex general halftrek criterion (henceforth, gHTC).
The notion of “subtracting out a direct effect” in order to turn a variable into an instrument was first noted by [shardell:15] when attemping to identify the total effect of on . It was noticed that in certain cases, the violation of the independence restriction of a potential instrument (i.e. is not independent of the error term of
) could be remedied by identifying, using ordinary least squares regression, and then subtracting out the necessary direct effects on
. In this paper, we generalize and operationalize this notion so that it can be used on arbitrary sets of known coefficient values and be utilized in conjunction with graphical methods for identification and enumeration of testable implications.The paper is organized as follows: Sec. 2 reviews notation and graphical notions that will be used in the paper. In sec. 3, we introduce and formalize auxiliary variables and auxiliary instrumental sets. Additionally, we give a sufficient graphical condition for the identification of a set of coefficients using auxiliary instrumental sets. In sec. 4, we show that auxiliary instrumental sets subsume the gHTC. Finally, in sec. 5, we discuss additional applications of auxiliary variables, including identifying testable implications and zidentification [bareinboim:pea12].
2 Preliminaries
The causal graph or path diagram of a SEM is a graph, , where are nodes or vertices, directed edges, and bidirected edges. The nodes represent model variables. Directed eges encode the direction of causality, and for each coefficient , an edge is drawn from to . Each directed edge, therefore, is associated with a coefficient in the SEM, which we will often refer to as its structural coefficient. The error terms, , are not shown explicitly in the graph. However, a bidirected edge between two nodes indicates that their corresponding error terms may be statistically dependent while the lack of a bidirected edge indicates that the error terms are independent.
If a directed edge, called , exists from to then is a parent of . The set of parents of is denoted . Additionally, we call the head of and the tail. The set of tails for a set of directed edges, , is denoted while the set of heads is denoted . For a node, , the set of edges for which is denoted . Finally, the set of nodes connected to by a bidirected arc are called the siblings of or .
A path from to is a sequence of edges connecting the two nodes. A path may go either along or against the direction of the edges. A nonendpoint node on a path is said to be a collider if the edges preceding and following both point to .
A path between and is said to be unblocked given a set , with if every noncollider on the path is not in and every collider on the path is in [pearl:09], where are the ancestors of . Unblocked paths of the form or are directed paths. Any unblocked path that is not a directed path is a divergent path.
denotes the covariance between two random variables,
and , and is the covariance between random variables and induced by the model . denotes that is independent of , and similarly, denotes that is independent of according to the model,. We will assume without loss of generality that the model variables have been standardized to mean 0 and variance 1.
We will also utilize a number of definitions around halftreks [foygel:12].
Definition 1.
A halftrek, , from to is an unblocked path from to that either begins with a bidirected arc and then continues with directed edges towards or is simply a directed path from to .
We will denote the set of nodes connected to a node, , via halftreks . For example, in Figure 2(a), and are both halftreks from to . However, in Figure 2(b) is not a halftrek from to because it begins with an arrow pointing to .
Definition 2.
For a given path, , from to , Left() is the set of nodes, if any, that has a directed edge leaving it in the direction of in addition to . Right() is the set of nodes, if any, that has a directed edge leaving it in the direction of in addition to .
For example, consider the path . In this case, Left() and Right() . is a member of both Right and Left(.
Definition 3.
A set of paths, , has no sided intersection if for all such that , LeftLeft=RightRight.
Consider the set of paths . This set has no sided intersection, even though both paths contain , because Left() , Left() , Right() , and Right() . In contrast, does have a sided intersection because is in both Right() and Right().
Wright’s rules [wright:21] allows us to equate the modelimplied covariance, , between any pair of variables, and , to the sum of products of parameters along unblocked paths between and .^{2}^{2}2Wright’s rules characterize the relationship between the covariance matrix and model parameters. Therefore, any question about identification using the covariance matrix can be decided by studying the solutions for this system of equations. However, since these equations are polynomials and not linear, it can be very difficult to analyze identification of models using Wright’s rules [brito:04]. Let denote the unblocked paths between and , and let be the product of structural coefficients along path . Then the covariance between variables and is . We will denote the expression that Wright’s rules gives for in graph , .
Instrumental variables (IVs) is one of the most common methods of identifying parameters in linear models. The ability to use an instrumental set to identify a set of parameters when none of those parameters are identifiable individually using IVs was first proposed by [brito:pea02a].
Definition 4 (Simple Instrumental Set).
is a simple instrumental set for the coefficients associated with edges if the following conditions are satisfied.

.

Let be the graph obtained from by deleting edges . Then, for all .^{3}^{3}3This condition can also be satisfied by conditioning on a set of covariates without changing the results below, but for simplicity we will not consider this case. When conditioning on a set of covariates, is called a generalized instrumental set.

There exist unblocked paths such that is an unblocked path from to and has no sided intersection.
If is a simple instrumental set for , then we can use Wright’s rules to obtain a set of linearly independent equations in terms of the coefficients, enabling us to solve for the coefficients [brito:pea02a].
3 Auxiliary Variables
We start this section by motivating auxiliary variables through an example. Consider the structural system depicted in Figure 0(a). In this system, the structural coefficient is not identifiable using instrumental variables or instrumental sets. To witness, note that , , and all fail to qualify as instruments due to the spurious paths, , , and , respectively^{4}^{4}4Note that even if we consider conditional instruments [brito:pea02a], these paths cannot be blocked, and identification is not possible.. If the coefficient is known,^{5}^{5}5The coefficient may be available through different means, for instance, from a smaller randomized experiment, pilot study, or substantive knowledge, just to cite a few. In this specific case, however, can be identified directly without invoking external information by simply using as an instrument. we can add an auxiliary variable, , to the model. Subtracting from cancels the effect of on so that has no effect on . Now, is an instrument for . The sum of products of parameters along backdoor paths from to is equal to 0 and .
Surprisingly, auxiliary variables can even be used to generate instruments from effects of and . For example, consider Figures 1(a) and 1(b). In both examples, is clearly not an instrument for . However, in both cases, is identifiable using as an instrument, allowing us to construct the auxiliary variable, , which does qualify as an instrument for (see Theorem 1 below).
The following definition establishes the augmented model, which incorporates the variable into the model.^{6}^{6}6[chan:kuroki10] also gave a graphical criterion for identification of a coefficient using descendants of . in Figure 1(a) can also be identified using their method.
Definition 5.
Let be a structural causal model with associated graph and a set of directed edges such that their coefficient values are known. The augmented model, , includes all variables and structural equations of in addition to new auxiliary variables, , one for each variable in such that the structural equation for is , where , for all . The corresponding graph is denoted .
For example, let and be the model and graph depicted in Figure 0(a). The augmented model is obtained by adding a new variable to . The corresponding graph, , is shown in Figure 0(b). The following lemma establishes that the covariance between any two variables in can be obtained using Wright’s rules on , where is the set of variables in and is the set of variables added to the augmented model.^{7}^{7}7Note that auxiliary variables may not have a variance of 1. We will see that this does not affect the results of the paper since the covariance between model variables implied by the graph is correct, even after the addition of auxiliary variables.
Lemma 1.
Given a linear structural model, , with induced graph , and a set of directed edges with known coefficient values, , where and .^{8}^{8}8See Appendix for proofs of all lemmas.
The above lemma guarantees that the covariance between variables implied by the augmented graph is correct, and Wright’s rules can be used to identify coefficients in the model . For example, using Wright’s rules on , depicted in Figure 0(b), yields
and
so that . As a result, can be used as an instrumental variable for when clearly could not.
Definition 6 (Auxiliary Instrumental Set).
Given a semiMarkovian linear SEM with graph and a set of directed edges whose coefficient values are known, we will say that a set of nodes, , in is an auxiliary instrumental set or auxIS for if is an instrumental set for in , where is the set of variables in that have auxiliary variables in .
The following lemma characterizes when an auxiliary variable will be independent of a model variable and is used to prove Theorem 1.
Lemma 2.
Given a semiMarkovian linear SEM with graph , if and only if is dseparated from in , where and is the graph obtained when is removed from .
The following theorem provides a simple method for recognizing auxiliary instrumental sets using the graph, .
Theorem 1.
Let be a set of directed edges whose coefficient values are known. A set of directed edges, , in a graph, , is identified if there exists such that:

,

for all , , where and is the graph obtained by removing the edges in from , and

there exists unblocked paths such that is a path from to and has no sided intersection.
If the above conditions are satisfied then is an auxiliary instrumental set for .
Proof.
We will show that is an instrumental set in . First, note that if , then is an instrumental set in and we are done. We now consider the case when . Since , , IS(i) is satisfied. Now, we show that IS(iii) is satisfied. For each , let be the path in from to . Now, for each , let be the concatenation of path with . It should be clear that satisfies IS(iii) in . Lastly, we need to show that IS(ii) is also satisfied.
First, if , then . It follows that since no new paths from to can be generated by adding the auxiliary nodes (see Lemma 8 in Appendix). Now, we know that from (ii) and Lemma 2. Finally, since adding auxiliary variables cannot generate new paths between the existing nodes, we know that , and we are done.
for all follows from (ii), Lemma 2, and the fact that no new paths from to can be generated by adding auxiliary nodes, proving the theorem.∎
To see how Theorem 1 can be used to identify auxiliary instrumental sets, consider Figure 2(a). Using instrumental sets, we are able to identify , but no other coefficients. Once is identified, can be identified using as an instrument in since qualifies as an instrument for when the edge for is removed (see Figure 2(b)).^{9}^{9}9Note that if , then the conditions of Theorem 1 are satisfied if is an instrumental set in . Now, the identification of allows us to identify and using in , since is an instrument for and when the edge for is removed (see Figure 2(c)).
The above example also demonstrates that certain coefficients are identified only after using auxiliary instrumental sets iteratively. We now define auxIS identifiability, which characterizes when a set of coefficients is identifiable using auxiliary instrumental sets.
Definition 7 (AuxIS Identifiability).
Given a graph , a set of directed edges is auxIS identifiable if there exists a sequence of sets of directed edges s.t.

is identified using instrumental sets in ,

is identified using auxiliary instrumental sets for all in where ,

and is identified using auxiliary instrumental sets in , where .
4 Auxiliary Instrumental Sets and the HalfTrek Criterion
In this section, we explore the power of auxiliary instrumental sets, ultimately showing that they are at least as powerful as the gHTC. Having defined auxiliary instrumental sets, we now briefly describe the gHTC. The gHTC is a generalization of the halftrek criterion that allows the identification of arbitrary coefficients rather than the whole model [chen:15].^{10}^{10}10If any coefficient is not identified, then the halftrek criterion algorithm will simply output that the model is not identified. First, we give the definition for the general halftrek criterion, then we will discuss how it can be used to identify coefficients before showing that any gHTC identifiable coefficient is also auxIS identifiable.
Definition 8 (General HalfTrek Criterion).
Let be a set of directed edges sharing a single head . A set of variables satisfies the general halftrek criterion with respect to , if

,

,

There is a system of halftreks with no sided intersection from to , and

.
A set of directed edges, , sharing a head is identifiable if there exists a set, , that satisfies the general halftrek criterion (gHTC) with respect to , and consists only of “allowed” nodes. Intuitively, a node is allowed if is identified or empty, where is the set of edges that

lie on halftreks from to or

lie on paths between and .
We will continue to use the notation and allow to be a set of nodes. When satisfies the gHTC and consists only of allowed nodes for , we say that is a gHT admissible set for . If a gHT admissible set exists for , then the coefficients in are gHT identifiable. The lemma below characterizes when a set of parameters is gHT identifiable. This characterization parallels Definition 7 and will prove useful in the proofs to follow.
is not identifiable in nonparametric models, even with experiments over
(b) The augmented graph, , where is identified using as a quasiinstrument, if we assume linearity (c) is an instrumental set for .Lemma 3.
If a set of directed edges, , is gHT identifiable, then there exists sequences of sets of nodes, , and sets of edges, , such that

satisfies the gHTC with respect to for all ,

, where for all , and

for all .
Proof.
The lemma follows from Theorem 1 in [chen:15]. ∎
To see how the gHTC can be used to identify coefficients, consider again Figure 2(a). Initially, only is identifiable. We are able to use or as a gHT admissible set for since and are equal to . All other nodes are halftrek reachable from and their edges on the halftrek from are not identified. Once is identified, we can use as a gHT admissible set to identify . Similarly, once is identified, we can use as a gHT admissible set to identify and .
The following lemma connects gHTadmissibility with auxiliary instrumental sets.
Lemma 4.
If is a gHTadmissible set for a set of directed edges with head , then is identified using instrumental sets in .
Now, we are ready to show that auxIS identifiability subsumes gHT identifiability.
Theorem 2.
Given a semiMarkovian linear SEM with graph , if a set of edges, , with head , is gHTC identifiable, then it is auxIS identifiable.
Proof.
Since is gHTC identifiable, there exists sequences of sets of nodes, , and sets of edges, , such that

satisfies the gHTC with respect to for all ,

, where for all , and

for all .
Now, using Lemma 4, we see that there is an instrumental set for in and is identified using instrumental sets and Lemma 3 in with for all . As a result, is AuxIS identifiable. ∎
5 Further Applications
We have formalized auxiliary variables and demonstrated their ability to increase the identification power of instrumental sets. In this section, we discuss additional applications of auxiliary variables as alluded to in the introduction, namely, incorporating external knowledge of coefficients values and deriving new constraints over the covariance matrix.
When the causal effect of on is not identifiable and performing randomized experiments on is not possible (due to cost, ethical, or other considerations), we may nevertheless be able to identify the causal effect of on using knowledge gained from experiments on another set of variables . The task of determining whether causal effects can be computed using surrogate experiments generalizes the problem of identification and was named identification in [bareinboim:pea12]. They provided necessary and sufficient conditions for this task in the nonparametric setting. Considering Figure 3(a), one can immediately see that the effect of on is not identifiable, given the unblockable backdoor path. Additionally, using BP’s zidentification condition, one can see that the effect of on is not identifiable, even with experiments over . Remarkably, if one is willing to assume that the system is linear, more can be said. The experiment over would yield , allowing us to create an auxiliary variable, , which is represented by Figure 3(b). Now, can be easily identified using auxiliary variables. To witness, note that and so that .
While is not technically an instrument for in , it behaves like one. When allows the identification of by using an auxiliary variable , we will call a quasiinstrument. The question naturally arises whether we can improve auxIS identifiability (Def. 7) by using quasiinstruments. However, auxIS identifiability requires that we learn the value of from the model, not externally. In order to identify from the model, we would require an instrument. If such an instrument, , existed, as in Figure 3(c), then both and could have been identified together using as an instrumental set. As a result, quasiinstruments are not necessary. However, if could only be evaluated externally, then quasiinstruments are necessary to identify .
In some cases, the cancellation of paths due to auxiliary variables may generate new vanishing correlation constraints. For example, in Figure 0(b), we have that . Thus, we see that auxiliary variables allows us to identify additional testable implications of the model. Moreover, if certain coefficients are evaluated externally, that information can also be used to generate testable implications. Lemma 2 can be used to identify independences involving auxiliary variables from the graph, .
Besides zidentification and model testing, these new constraints can also be used to prune the space of compatible models in the task of structural learning. Additionally, it is natural to envision that auxiliary variables can be useful to answer evaluation questions in different, but somewhat related domains, such as in the transportability problem [pearl:bar11r372a], or more broadly, the datafusion problem [bareinboim:pea15r450], where datasets collected under heterogenous conditions need to be combined to answer a query in a target domain.
6 Conclusion
In this paper, we tackle the fundamental problem of identification in linear system as articulated by [fisher:66]. We move towards a general solution of the problem, enriching graphbased identification and model testing methods by introducing auxiliary variables. Auxiliary variables allows existing identification and model testing methods to incorporate knowledge of nonzero parameter values. We proved independence properties of auxiliary variables and demonstrated that by iteratively identifying parameters using auxiliary instrumental sets, we are able to greatly increase the power of instrumental sets, to the extent that it subsumes the most general criterion for identification of linear SEMs known to date. We further discussed how auxiliary variables can be useful for the general tasks of testing and identification.
7 Acknowledgments
This research was supported in parts by grants from NSF #IIS1302448 and #IIS1527490 and ONR #N000141310153 and #N000141310153.
8 Appendix
Lemma 1.
Given a linear structural model, , with induced graph , and a set of directed edges with known coefficient values, , where and .
Proof.
First, we consider the case where neither nor are in . In this case, it should be clear that . Now, since and are the same variables in as they are in , we have that Wright’s rules hold for in using .
Next, we consider that case when one of or equals for some . Without loss of generality, let us assume that . First note that so
where and . Additionally, note that the only edges which are connected to are the directed edges, with coefficients , and with the coefficient 1. As a result,
Finally, we consider the where and for some . Here,
Similarly,
∎
Lemma 2.
Given a semiMarkovian linear SEM with graph , if and only if is dseparated from in , where and is the graph obtained when is removed from .
Proof.
First, we show the sufficiency of Lemma 2. Suppose that . Then . First, since , there are no unblocked paths between and that do not include the edge for some . As a result, there are no blocked paths between and that do not either include the edge or . Now, Lemma 6 tells us that for each path, , beginning , there is a corresponding path beginning for which . As a result, for there must be an unblocked path, from to beginning . Lemma 7 tells us that such paths must include as an internal node. Since , and there are no unblocked paths between and that do not include the edge , must be of the form . However, this path visits twice and is therefore not an unblocked path. As a result, , and we have a contradiction.
Now, we show the necessity of Lemma 2. If is not dseparated from in , then there exists some path, , between and that does not include an edge in . As a result, there is a path in , from to , which is the concatenation of with . Clearly, this path is not cancelled out in the way paths that go through are. As a result, we have that . ∎
Theorem 1.
Let be a set of directed edges whose coefficient values are known. A set of directed edges, , in a graph, , is identified if there exists such that:

,

for all , , where and is the graph obtained by removing the edges in from , and

there exists unblocked paths such that is a path from to and has no sided intersection.
If the above conditions are satisfied then is an auxiliary instrumental set for .
Proof.
First, note that if , then is an instrumental set in and we are done. We now consider the case when . Since , , IS(i) is satisfied. Now, we show that IS(iii) is satisfied. For each , let be the path in from to . Now, for each , let be the concatenation of path with . It should be clear that satisfies IS(iii) in . Lastly, we need to show that IS(ii) is also satisfied.
Lemma 4.
If is a gHTadmissible set for a set of directed edges with head , then is identified using instrumental sets in .
Proof.
First, note that if , then is an instrumental set in and we are done. We now consider the case when .
Let . We will show that is an instrumental set in . From HT(i), we have that . Since , and IS(i) is satisfied. From HT(iii), we have that is a set of paths from to with no sided intersection. For each , let be the path in from to . Now, for each , let be the concatenation of path with . It should be clear that satisfies IS(iii) in . We need to show that IS(ii) is also satisfied.
Consider any such that . In order for to be a gHTadmissible set, any path, connecting with in must include an edge, in . Moreover, . As a result, . It follows from Lemma 2 that . As a result, is an instrumental set for in . ∎
Theorem 2.
Given a semiMarkovian linear SEM with graph , if a set of edges, , with head , is gHTC identifiable, then it is IS+EC identifiable.
Proof.
Since