Nested Covariance Determinants and Restricted Trek Separation in Gaussian Graphical Models

07/19/2018 ∙ by Mathias Drton, et al. ∙ MIT University of Washington 0

Directed graphical models specify noisy functional relationships among a collection of random variables. In the Gaussian case, each such model corresponds to a semi-algebraic set of positive definite covariance matrices. The set is given via parametrization, and much work has gone into obtaining an implicit description in terms of polynomial (in-)equalities. Implicit descriptions shed light on problems such as parameter identification, model equivalence, and constraint-based statistical inference. For models given by directed acyclic graphs, which represent settings where all relevant variables are observed, there is a complete theory: All conditional independence relations can be found via graphical d-separation and are sufficient for an implicit description. The situation is far more complicated, however, when some of the variables are hidden (or in other words, unobserved or latent). We consider models associated to mixed graphs that capture the effects of hidden variables through correlated error terms. The notion of trek separation explains when the covariance matrix in such a model has submatrices of low rank and generalizes d-separation. However, in many cases, such as the infamous Verma graph, the polynomials defining the graphical model are not determinantal, and hence cannot be explained by d-separation or trek-separation. In this paper, we show that these constraints often correspond to the vanishing of nested determinants and can be graphically explained by a notion of restricted trek separation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Let be a directed graph with finite vertex set and edge set . The edge set is always assumed to be free of self-loops, so for all . For each vertex , define a set of parents . The graph

induces a statistical model for the joint distribution of a collection of random variables

, , indexed by the graph’s vertices. The model hypothesizes that each variable is a function of the parent variables and an independent noise term. In this paper we consider the Gaussian case, in which the functional relationships are linear so that

(1.1)

where the , , are independent and centered Gaussian random variables. The coefficients and are unknown real parameters that are assumed to be such that the system (1.1) admits a unique solution . Typically termed a system of structural equations, (1.1) specifies cause-effect relations whose straightforward interpretability is behind the wide-spread use of the models [SGS00, Pea09].

The random vector

that solves (1.1

) follows a Gaussian distribution whose mean vector may be arbitrary through the choice of the parameters

but whose covariance matrix is highly structured. The model obtained from (1.1) thus naturally corresponds to the set of covariance matrices, which we denote by . This set is given parametrically with each covariance being a rational or even polynomial function of the parameters

and the variances of the errors

, as we detail in Section 2.

While a parametrization is useful to specify a distribution and to optimize the likelihood function, many statistical problems can only be solved with some understanding of an implicit description. In our setting, an implicit description of the model amounts to a semi-algebraic description of the set of covariance matrices that belong to the model through polynomial equations and inequalities, and a combinatorial criterion on the graph which specifies how to obtain them. Specific problems that can be addressed through such an implicit description include model equivalence, parameter identification, and constraint-based statistical inference. We refer the reader to the recent work of [vOM17] and the reviews of [Drt18] and [DM17].

If the underlying graph is an acyclic digraph, also termed a directed acyclic graph (DAG), then probabilistic conditional independence yields an implicit description of [Lau96, Stu05]. For a Gaussian joint distribution, conditional independence corresponds to the vanishing of special subdeterminants of the covariance matrix, namely, subdeterminants that are almost principal in the sense that the row and the column index sets agree in all but one element [LM07, DSS09, Chap. 3.1]. The conditional independences holding in all distributions in the given model can be found graphically using the concept of -separation. It follows in particular that two DAGs and give rise to the same model if and only if and have the same -separation relations. This combinatorial criterion can be simplified to yield an efficient algorithm: if and only if and have the same skeleta and the same sets of unshielded colliders [Fry90, VP91].

While DAG models are well-understood, they only pertain to problems where all relevant variables are observed. A long-standing program in the fields of graphical modeling and causal inference seeks to develop combinatorial solutions to problems such as model equivalence in settings with hidden/latent variables. Mathematically, if only the variables indexed by a set are observed while those indexed by are hidden, then the covariance matrices in the set are to be projected on their principal submatrix. It is well known that conditional independence is no longer sufficient for implicit model description after such a projection.

1

2

3

5

4
Figure 1.1. A DAG on five vertices. Vertex 5 indexes a hidden variable.
Example 1.1.

Let be the DAG in Figure 1.1, where vertex 5 indexes a hidden variable. Then no conditional independence involving only the observed , , , and holds for all covariance matrices in . Instead, a positive definite matrix is the projection of a matrix in if and only if

and implies for .

In the example just given the key constraint is a determinant of the covariance matrix that cannot be explained by -separation. A major advance in this decade was the introduction of trek separation, which is a graphical criterion that can be used to decide the vanishing of any subdeterminant of the covariance matrix [DST13, STD10]. Although more work is required to fully exploit trek separation in model equivalence criteria, the notion has already seen application in parameter identification problems [WRD18].

While greatly generalizing Gaussian conditional independence, determinantal constraints are again not sufficient to describe the sets after projection to the covariance matrix of observed variables. The following example is due to Thomas Verma.

1

2

3

5

4
Figure 1.2. The Verma graph. Vertex 5 indexes a hidden variable.
Example 1.2.

Let be the graph from Figure 1.2. Then as in the first example no conditional independence that holds for involves only the observed variables , , , and . Instead, a positive definite matrix is the projection of a matrix in if and only if

(1.2)

(compare Example 3.3.14 in [DSS09]). The polynomial is not a subdeterminant of and, therefore, is neither explained by -separation nor by trek-separation.

Another key advance in the area is a graph decomposition result of Jin Tian [TP02]; see also [Drt18, Sections 5-6]. This result allows one to derive constraints by applying -separation in certain subgraphs. In particular, the vanishing polynomial from (1.2) can be shown to arise from the independence of variables and that holds for the subgraph obtained by removing the edges and from the Verma graph in Figure 1.2. For further details, we refer the reader to the review in [SERR14].

In the next example however, neither Tian’s graph decomposition nor trek separation provide any insight.

1

5

2

6

3

7

4
Figure 1.3. Graph based on [vOM17, Fig. 1]. Vertices 5, 6 and 7 index hidden variables.
Example 1.3.

Let be the graph from Figure 1.3. There are four observed variables, and projecting gives a set of codimension one. As discussed in [vOM17], any covariance satisfies the constraint

(1.3)

The irreducible polynomial in (1.3) defines the hypersurface that contains the projection of .

A closer look at Examples 1.2 and 1.3 reveals some common structure. Both constraints are nested determinants, by which we mean determinants of a matrix whose entries are determinants themselves. This observation is the point of departure for our paper.

Example 1.4.

The Verma polynomial from Example 1.2 admits a compact representation through nested determinants, namely,

(1.4)

Such a representation is generally not unique. For instance,

(1.5)

The polynomial from Example 1.3 is also a nested determinant, namely,

(1.6)

We are not aware of any literature emphasizing these types of representations.

In this paper, we investigate combinatorial conditions on the graph that entail the vanishing of nested determinants. We give a rigorous definition of the models we study in Section 2, where we also provide background on the current knowledge of their description. In particular, we introduce mixed graph models that play an important role in model selection [DM17, Section 5.2]. Section 3 shows how nested determinants arise under conditions of ancestrality. In Theorem 3.7 we show that such determinants completely describe the model for a wide class of mixed graphs that are (nearly) ancestral. Section 4 describes our notion of restricted trek separation in the setting of arbitrary acylic mixed graphs. In Section 5, we show how the vanishing of nested determinants can follow from restricted trek separation. The result we present also implies the vanishing of the constraints exhibited for (nearly) ancestral graphs in Section 3. In Section 6, we give examples that involve recursive nesting of determinants and are beyond the scope of our results. Nevertheless, we can relate these examples back to restricted trek separation. While our focus is on acyclic mixed graphs, our last example shows that a nested determinant may also arise for graphs containing directed cycles. We conclude with Section 7, where we discuss future work and open problems.

2. Background

2.1. Structural equation models

Let be the random error vector for the equation system in (1.1). As we are only concerned with the covariance structure, we disregard the offsets . The system can then be written as

(2.1)

where the matrix holds the unknown coefficients. Let be the covariance matrix of , which we assume positive definite. Assuming further that is invertible, the random vector is the unique solution to the linear system in (2.1) and has covariance matrix

(2.2)

In the introduction we focused on the case where the individual error terms are independent. Their covariance matrix is then diagonal. In this case, a model postulating that some of the coefficients in are zero is conveniently represented by a directed graph, as was our setup in Section 1. Going forward, we also allow for dependence among the and a possibly non-diagonal matrix . Nonzero off-diagonal terms of are commonly represented by adding bidirected edges to the considered directed graph.

A mixed graph is a triple , where is the set of directed edges, and is the set of bidirected edges which is comprised of unordered pairs of elements of . We denote a directed edge from to by , and a bidirected edge by . Let be the set of matrices with support , i.e.,

Let be the subset of matrices for which is invertible. Let be the cone of positive definite matrices, and define to be the subcone of matrices supported over , i.e.,

The mixed graph is acyclic if its directed part does not contain any directed cycles. In this case, can be ordered such that all matrices are strictly upper-triangular. Thus, the determinant and . By Cramer’s rule, the covariances in in (2.2) are then polynomial functions of the entries of and .

Taking the error to be Gaussian, a given mixed graph induces the following statistical model for the joint distribution of .

Definition 2.1.

The linear structural equation model given by a mixed graph

is the family of all multivariate normal distributions on

with covariance matrix in the set

The set is semialgebraic. Since is the image of this set under a rational map, the Tarski-Seidenberg theorem yields that itself is a semialgebraic set and, thus, admits a polynomial description. In this paper, we are interested in studying polynomial equations that are satisfied by the matrices in . With interpreted as a symmetric matrix of indeterminates, define to be the ring of polynomials in the . Then the polynomial relations we seek to understand make up the vanishing ideal

Suppose a variable , , is hidden. Then the remaining variables have their covariance matrix in the set obtained by projecting each matrix in onto its submatrix. Two comments are in order. First, we emphasize that for a fixed , the polynomials that do not involve any of the indeterminates indexed by , i.e., is free of for , give precisely the polynomial constraints holding for the model in which random variable is hidden. Second, the paradigm of mixed graphs allows one to directly capture relations after projection. Indeed, a graphical operation known as “latent projection” creates a new mixed graph over the observed variables that represents key relations among covariances of observed variables; see [Pea09, Section 2.6], [Kos02] or [Wer11]. For instance, the three examples from our introduction would be represented by the three mixed graphs in Figure 2.1. In these examples, the ideal of the given mixed graph coincides with the ideal of polynomial relations among the observed covariances in the hidden variable model given by the original DAG .

(a)  

1

2

3

4
  (b)  

1

2

3

4

(c)  

1

2

3

4

Figure 2.1. (a)-(c) Mixed graphs obtained by latent projection of the DAGs in Figures 1.1-1.3, respectively.

2.2. Trek rule

Again let be any mixed graph, possibly cyclic. The starting point for any combinatorial understanding of polynomials in the vanishing ideal is the trek rule. This rule specifies each entry of the covariance matrix in (2.2) as a sum of monomials associated with certain paths in the graph.

Definition 2.2.

A trek is a path of the form

  1. , or

  2. ,

for integers with . Here, a path may visit a vertex more than once. If , the trek is simply the directed path . Similarly, it is if . We call a trek from to or also a trek between and . The sets and are the left side and the right side of , respectively.

Let and . To any trek , specified as in Definition 2.2, associate a trek monomial

(2.3)

The trek rule now states that the covariance matrix has its entries given by

(2.4)

The rule, which originates in the work of [Wri34], is obtained by observing that . The right-hand side of (2.4) is a polynomial when is acyclic and a (formal) power series otherwise.

2.3. Conditional independence and subdeterminants

The notion of -separation allows one to decide by inspection of paths in a mixed graph whether a conditional independence relation holds for all distributions in the model given by ; see e.g. [Drt18, Section 10]. In algebraic terms, for a Gaussian joint distribution, variables and are conditionally independent given a subvector with if and only if the subdeterminant is zero. Here, denotes the union of a singleton set and the set . Thus, -separation gives a combinatorial characterization of when a subdeterminant of the form belongs to the ideal . If is a DAG, then the covariance model admits a semi-algebraic description by conditional independence. Indeed, is the set of positive definite matrices for which all conditional independence determinants associated with the graph vanish. This is also true for mixed graphs that are maximal ancestral [RS02], but false more generally as the examples in the introduction show.

In seminal work, Sullivant, Talaska and Draisma [STD10] move beyond conditional independence determinants and give a combinatorial characterization of when an arbitrary subdeterminant is in . We briefly review their concept of trek-separation; see also [Drt18, Section 11].

Definition 2.3.

Two sets are trek-separated by the pair , where , if every trek between a vertex from and a vertex from intersects either on its left side or on its right side.

In the case the following theorem shows that if and only if the sets of vertices and are trek-separated by a pair with .

Theorem 2.4 (Thm. 2.17, [Std10]).

Let . The submatrix has rank at most for all covariance matrices if and only if there exist subsets such that and trek-separates from . For a generic choice of ,

While trek-separation greatly generalizes -separation and can yield a generating set of for some mixed graphs [FRS16], it is in general not sufficient to understand the vanishing ideal as we demonstrated in Examples 1.2 and 1.3.

3. Ancestral vertices and overdetermined linear systems

We now proceed to a first combinatorial condition (see Proposition 3.4) for the vanishing of very special nested determinants (Definition 3.1). Fix a mixed graph , and let . For a pair of matrices and , it holds that

In turn, if and only if

(3.1)

For some graphs it is known that all entries of can be recovered as rational expressions of , at least for generic choices of positive definite . For instance, the half-trek criterion [FDD12] and its extensions [Che16, DW16, WRD18] can be used to certify graphically that such rational identification of from is possible and to find rational expressions. If now both the th and the th column of are rationally identifiable from , then the left-hand side of the equation in (3.1) can be expressed as a rational function of . If and , then one finds a rational constraint on that after clearing denominators yields a polynomial in . This approach is used, for instance, in [vOM17].

In this section we follow a similar approach in which we substitute solutions for some of the entries of that appear in (3.1). However, we only linearize the equations and then observe that nested determinantal constraints arise from overdetermined linear equation systems. Specifically, we study the following constraints.

Definition 3.1.

Let be a vertex of the mixed graph , and let be a subset of vertices in . Define a matrix of polynomials of size as

(3.2)

The parentally nested determinants for the pair are the minors of order of the matrix . When is a singleton, there is only one parentally nested determinant

(3.3)

Here, index sets are treated as multisets with possibly repeated elements, and the determinants are formed according to a prespecified linear order for the vertex set . The symbol stands for the sum (or disjoint union) of multisets; e.g., .

Suppose . Then is repeated in the row index set for the matrix . In this case indexes two rows for a minor, which is then zero. In particular, if then . We may therefore always restrict the set to satisfy .

A repeated index may also arise for the column index sets of the matrices whose determinants yield the entries of . Indeed, if is also in for , then the entry of is zero.

Example 3.2.

It holds that in Example 1.2, and in Example 1.3.

In the remainder of this section, we identify conditions under which parentally nested determinants vanish.

Definition 3.3.

A vertex in the mixed graph is ancestral if (i) is not on any directed cycle, and (ii) no vertex has both and a directed path from to .

Let and , and define . If is ancestral, then all treks from a vertex to end with a directed edge pointing to . The trek rule from (2.4) then implies that

(3.4)

For our next result it is convenient to introduce the set of siblings of a vertex , which is , the set of neighbors of in the bidirected part of the graph.

Proposition 3.4.

Let be a vertex of a mixed graph such that

  1. ,

  2. all vertices in are ancestral, and

  3. the set of all ancestral vertices in is non-empty.

Then the parentally nested determinants for the pair are in the vanishing ideal .

Proof.

Let and , and define . Neither nor contains vertices in . Fixing , (3.1) implies that

(3.5)

This equation becomes

(3.6)

Since all vertices in are ancestral, we may use (3.4) to get the rational equation

(3.7)

Now observe that for any vertex ,

(3.8)

Hence, multiplying the equation in (3.7) by gives

(3.9)

With one equation for every , the system is overdetermined and admits a solution only if the matrix from Definition 3.1 has rank at most . This in turn implies the vanishing of its minors. Note that in the case that is not trek reachable from , the last equation is trivial and corresponds to a row of zeros in . ∎

1

2

3

4
Figure 3.1. DAG on 4 vertices used to illustrate the nested determinants .
Example 3.5.

The graph from Figure 3.1 is a DAG, and thus all its vertices are ancestral. As there are no bidirected edges, for all . As previously noted, for any graph if . Here, . Moreover, . The nonzero polynomials are

The irreducible polynomial corresponds to conditional independence of and given . The second irreducible polynomial encodes conditional independence of and given . It turns out that

In fact, the ideal differs from only through components that do not vanish at positive definite matrices [RP14, Example 2]. Specifically,

Here,

is generated by three subdeterminants, two of which are conditional independences.

1

3

2

5

4
Figure 3.2. An ancestral graph that is not maximal.
Example 3.6.

The mixed graph in Figure 3.2 is an ancestral graph, that is, all vertices are ancestral [RS02]. It is not maximal, that is, there are non-adjacent vertices, namely, and , that cannot be -separated. There is then no conditional independence constraint associated to the non-adjacency. Precisely two of the are nonzero, namely,

In fact, , and . We note that there is also the alternative representation of

We now give a model description for a class of graphs that includes all ancestral graphs. It also covers the two graphs from Figure 2.1(b)(c). Recall that a sink of a mixed graph is any vertex that is not a parent of any other vertex. A subgraph of is a mixed graph with , , and . We call globally identifiable if is acyclic and none of its subgraphs has both a connected bidirected part and a unique sink vertex in its directed part .

For any set of polynomials , we let be the algebraic subset it defines in the space of symmetric matrices.

Theorem 3.7.

Let be a globally identifiable mixed graph with vertex set enumerated in a topological order. Suppose all vertices in are ancestral. Let be the set of all parentally nested determinants obtained from the pairs for . Then

Proof.

The inclusion follows from application of Proposition 3.4.

To show the reverse inclusion, we proceed by induction on the number of vertices . The statement is trivial for . In the induction step, let . Let be the submatrix obtained by removing the th row and column. Let be the subgraph induced by . Now, . The induction hypothesis yields that . Let for and .

Consider the matrix from Definition 3.1 evaluated at the given matrix . For each , divide the corresponding row by . The resulting matrix has entries

(3.10)

for and ; recall (3.8). Using (3.4), we obtain that

(3.11)

Form the submatrix , that is, we omit the column indexed by . Then

Lemma 2 in [DFS11] yields that has full column rank.

Since , the matrix and thus also do not have full column rank. We conclude that the kernel of contains a vector for which the last coordinate . Dividing by gives a vector that solves the equation system in (3.9). Define a matrix by using to define its last column. Then solves (3.5) for and all and thus also (3.1). Therefore, . ∎

The above facts leverage existence of ancestral vertices. In the next sections, we seek to give a more general condition for the vanishing of nested determinants. The results on vanishing nested determinants from this section can be recovered as a special case; see Proposition 5.7.

4. Restricted Trek Separation

As we reviewed in Section 2.3, the notion of trek separation [STD10] provides a combinatorial characterization of when a subdeterminant of the covariance matrix vanishes over a model . Underlying the trek separation result we stated in Theorem 2.4 is the observation that determinants correspond to sums of certain products of trek monomials. In this section, we recall this observation and then introduce a notion of restricted trek separation, in which separation only needs to occur with respect to treks that avoid certain vertices on their left or right sides. This notion will be used in Section 5 to obtain conditions that imply the vanishing of nested determinants.

Let and be two subsets of the vertex set of a mixed graph , with . A system of treks from to is a set of treks that each are between a vertex in and a vertex in . Let be such a system. Then has no sided intersection if any two distinct treks in have disjoint left sides and disjoint right sides. In particular, each vertex in and each vertex in is on precisely one trek, so that induces a bijection between and . Fixing an ordering of the elements of and , the trek system induces a permutation of in which the th element of is mapped to the end point of the trek that starts at the th element of . Write for the sign of this permutation. Now define

(4.1)

with the summation being over all systems of treks from to with no sided intersection; recall the definition of trek monomials from (2.3).

Theorem 4.1 ([Dst13]).

Suppose the underlying graph is acyclic. Then the determinant of equals .

This result admits a generalization to the case where the graph contains directed cycles. Indeed, [DST13] give a rational expression for the determinant of in terms of self-avoiding trek flows, which reduce to trek systems without sided intersection in the acyclic case. As this generalization is more involved, we will not give any details here and focus instead on acyclic graphs only.

We now extend the combinatorial characterization of determinants and the trek separation result from Theorem 2.4 to allow for restricted treks.

Definition 4.2.

Let , , , and be subsets of vertices of a mixed graph . A -restricted trek between and is a trek between a vertex in and a vertex in that has its left side in and its right side in . Let and be two further subsets of vertices. Then and are -restricted trek-separated by if every -restricted trek between and intersects on its left side or on its right side.

Example 4.3.

Consider the Verma graph from Figure 2.1(b). Take , , , and . Then and are -restricted trek-separated by . Indeed, every trek between and that only uses on the left and only uses on the right has to go through on the right. Note, however, that this is not true if, for example, or if .

The main observation of this section is that restricted trek separation is equivalent to a rank constraint on a special matrix. Note also that part (ii) of the theorem is a direct generalization of Theorem 4.1 to the restricted case.

Theorem 4.4.

Let be an acyclic mixed graph, and let and . For , consider the matrix

and its submatrix