This article concerns a class of statistical models for discrete random variables that was introduced by Kirkup in(Kir, ) and studied further by Sullivant in (Sul, , Section 4.3.2). Let
be random variables, wherehas the finite state space
. A joint distribution for these random variables is a tensorof format whose entries are nonnegative real numbers that sum to
. These distributions are elements in the probability simplex, where . As is customary in algebraic statistics (AHT, ; BC, ; ST, ; Sul, ), we identify this simplex with the projective space . Furthermore, by a “model” we will mean either a subvariety of or its intersection with , depending on context. The latter is a semialgebraic set.
For any subset of , we write for the corresponding marginalization. Thus is a tensor with entries. These are obtained from the entries of by summing out the indices not in . For instance, if and then is the matrix whose entry in row and column equals .
Let be any simplicial complex with vertex set . We assume for all . The marginal independence model is the set of distributions such that has rank for all . The random variables are completely independent for . Since the rank constraint is the vanishing of the minors of all flattenings of , we see that is a variety defined by quadratic equations in the tensor space . The following model, pairwise independence for three variables, appears in (Sul, , Section 4.3.2).
Example 1 (-cycle).
Let and . The model comprises tensors whose three marginalizations are matrices of rank . This is an irreducible variety of dimension in the tensor space .
The natural constraints that describe this variety are the various determinants
These do not generate the prime ideal of . See (Kir, , Section 6) for extraneous components.
The binary model (all ), which serves as running example in (Kir, ), has dimension and degree in . Its ideal is generated by five quadrics and is Gorenstein. The ternary model (all ) has dimension and degree in . For this model, Kirkup concluded on (Kir, , page 453) that “a free resolution cannot be computed”. This is no longer true. Using the toric representation in Theorem 4, we easily find the Macaulay2 Betti diagram for :
0 1 2 3 4 5 6 7 8 9 10 11 12 total: 1 55 303 920 2309 4740 6700 6038 3348 1082 228 61 7 0: 1 . . . . . . . . . . . . 1: . 55 303 711 759 285 . . . . . . . 2: . . . 209 1550 4455 6699 6029 3276 887 30 . . 3: . . . . . . 1 9 72 195 198 61 7
Note that the ideal is perfect (Cohen-Macaulay), in accordance with (Kir, , Conjecture 28).
The toric representation is due to Sullivant (Sul, , Section 4.3.2), who proves it for and states that it “will also work for many other marginal independence models”. Our Sections 2 and 3 develop this in detail. We show that each model is a toric variety after a linear change of coordinates. This reaffirms that “the world is toric” (MS, , Section 8.3). The new coordinates were called Möbius parameters by Drton and Richardson (DR, ). These authors also treat marginal independence but their set-up is based on bidirected graphs and it differs from ours. Most of the models in (DR, ) are not in the class we study here; see Example 4.
This paper is organized as follows. In Section 2 we introduce Möbius coordinates , and we show that these have desirable properties. For instance, for tensors, they are . In these coordinates, the prime ideal for the binary 3-cycle in Example 1 is the toric ideal
In Section 3 we show that every marginal independence model is toric in the Möbius coordinates . In particular, we identify the associated polytope and model matrix . Section 4 is devoted to the minimal generators of the toric ideals . We start from the matroid case, where the simplicial complex consists of all independent sets. We are especially interested in graphic matroids, where is the set of all subforests in a graph with edges. This leads to a new class of random graph models, featured in Example 5.
In Section 5 we turn to the statistical task of parameter estimation. Given any empirical distribution , we seek to compute a distribution in the model that best explains the data . We study this from the perspectives of maximum likelihood (ML) and Euclidean distance (ED), with emphasis on exact solutions to the corresponding optimization problems.
In Section 6 we present a complete list of all small marginal independence models with binary states. This includes all simplicial complexes for and all matroids for .
2. Möbius Coordinates and the Segre Variety
Let be the space of real tensors of format and the corresponding projective space. Following (Sul, , Section 4.3.2), we define an -linear map . The entries of are called Möbius coordinates. Here, . The Möbius coordinate is simply the linear form in the probability coordinates. That linear form is an entry of the marginal tensor where .
Example 1 (, all ).
The tensor has entries indexed by , for instance
The Möbius coordinate is the sum of all probability coordinates . The map is represented by an upper-triangular matrix with entries in . All diagonal entries are . Precisely of the entries above the diagonal are . The ordering of the basis which gives the upper-triangular form can be found under the “ternary 3-cycle” example on this project’s MathRepo page referred to below.
The map is invertible because it is represented by an upper triangular matrix with all diagonal entries equal to one. Moreover, all entries in the inverse matrix are , or . This is precisely the process of Möbius inversion for a poset on elements, which is the direct product of the posets , where is below the other elements, which are incomparable.
A nonzero tensor in is said to have rank one if its entries factorize as follows:
Here is an additional homogenization parameter, and the other parameters satisfy
Let denote the set of rank one tensors in . The image of in is isomorphic to . The total number of free parameters in the parametrization (3) is
In what follows, we refer to the affine cone in the tensor space as the Segre variety.
The homogeneous prime ideal of the Segre variety is a toric ideal. It is known to be generated by quadratic binomials. Namely, these ideal generators are the minors of all the flattenings of the tensor . For more information see (MS, , Sections 8.2 and 9.2). The parametrization corresponds dually to the map of polynomial rings , where each unknown is mapped to the corresponding linear form . We further identify the two index sets and by simply sending to . We write for the toric ideal in that is obtained from the toric ideal in from the resulting identification of labels. With this notation, we have the following key lemma.
Lemma 2 ().
The inverse image of the toric ideal in under the ring map is the toric ideal in . In other words, the linear change of coordinates preserves the Segre variety . In particular, is still a toric variety, even when written in Möbius coordinates.
Example 3 ().
Consider the most simple case, namely matrices. The objects of interest are the principal ideals and . The image of under the ring map is generated by the quadratic form
This identity shows that , so the first assertion in Lemma 2 holds.
Proof of Lemma 2.
We consider the one-to-one parametrization of given in (3) and (4). The image of under the associated ring map equals . Here some of the indices take the value . This means that we are summing over all states in . The hypothesis (4) ensures that whenever . Hence, we conclude that
). The exponent vectors of these monomials are the vertices of a polytope affinely isomorphic to. Here we are using the identification of labels that was described prior to Lemma 2. Hence the kernel of (6) equals the toric ideal . ∎
Of special interest in statistics is the case of binary random variables, where . Here each Möbius coordinate is indexed by a string in . Following (DR, , Section 3), we represent this string by a subset of , namely the set of indices where . Thus, in the binary case, Möbius coordinates are indexed by subsets of .
Example 4 (Binary -cycle).
Let . The eight Möbius coordinates are indexed by subsets , with the extra convention . The toric ideal of the Segre variety is generated by nine quadratic binomials in these unknowns. Now consider the -cycle model in Example 1. We write the prime ideal of this model in Möbius coordinates, and we find that it is generated by the
subpfaffians of the skew-symmetricmatrix shown in (2). This is the toric ideal obtained from by eliminating .
Example 5 (Binary bowtie).
Let . The Möbius coordinates are indexed by subsets . Here is generated by quadratic binomials. Let be the graph with edges . As a simplicial complex, has faces, namely six edges, five vertices, and the empty set. We now eliminate all unknowns with from the Segre ideal . The resulting toric ideal in Möbius coordinates with defines . This ideal has codimension and degree . It is minimally generated by quadrics and one cubic, namely . This is the smallest marginal independence model whose ideal requires a cubic generator. In Section 4 we shall see binomials of higher degree.
3. Toric Representation
Fix positive integers , the Segre variety , and a simplicial complex on . The model consists of all probability tensors with marginals of rank one for all . Let denote the linear subspace of all tensors in whose marginals are zero for all . This subspace and our marginal independence model are related as follows:
Lemma 1 ().
A tensor lies in the model if and only if it is a sum of a rank one tensor and a tensor in . In symbols, we have the following rational parametrization of our variety:
This is (Kir, , Theorem 1), and the argument is as follows. Consider the -marginals . Their tensor product is a rank one tensor in . Assuming without loss of generality that the entries of sum to , this construction yields also for all . Assume now , so has rank one for . By independence this implies for all . Therefore, the difference lies in the linear space . ∎
We now connect the decomposition (7) to the Möbius coordinates from Section 2. The Möbius coordinate is said to be relevant if the set is a face of . Otherwise, is called irrelevant. In the binary case, the relevant Möbius coordinates are those indexed by the faces of , and the irrelevant ones are indexed by the nonfaces.
Lemma 2 ().
The linear space is the common zero set of all relevant Möbius coordinates . Hence its dimension equals the number of irrelevant Möbius coordinates, which is
This result appears in (HS, , Theorem 2.6), albeit in a slightly different context. For completeness, we offer a proof. We identify each Möbius coordinate with its image under , which is a sum of -coordinates. With this, each relevant Möbius coordinate is among the entries of the marginal tensor where . Since , the linear form vanishes on . Now, it remains to consider entries of whose indices involve for some . Each such entry is an alternating sum of relevant Möbius coordinates supported on faces contained in . In particular, vanishes whenever all relevant Möbius coordinates vanish. This proves the first assertion. The second assertion is obtained by counting irrelevant coordinates according to their support . ∎
Example 3 ().
We now present the Toric Representation Theorem for marginal independence models.
Theorem 4 ().
The variety is irreducible, and its prime ideal is toric in Möbius coordinates. It is obtained from the Segre ideal by eliminating all Möbius coordinates that are irrelevant. Viewed modulo the linear space , the model is the toric variety given by the monomial parametrization (6) where runs over all relevant Möbius coordinates.
This was shown for by Sullivant in (Sul, , Section 4.3.2). The proof for general is analogous. In our set-up, it consists of Lemmas 2, 1 and 2. Equation (7) says that is the cone over a projection of the Segre variety . This projection has the center , when viewed in . Algebraically, this corresponds to passing to the relevant Möbius coordinates, by Lemma 2. Lemma 2 furnishes the remaining step. Since the Segre variety is toric in Möbius coordinates, so is its projection onto the subset of relevant coordinates. ∎
Corollary 5 ().
The marginal independence model has the dimension expected from (7), i.e.
In particular, in the binary case, this dimension equals plus the number of nonfaces of .
The hypothesis for all ensures that the projection with center is birational on the Segre variety . Indeed, all model parameters in (6) can be recovered rationally from the relevant Möbius coordinates: the quadratic monomials occur for all and , and the special parameter occurs for the empty face in . ∎
In toric algebra (MS, , Chapter 8), one represents a toric variety by the integer matrix whose columns are the exponent vectors in a monomial parametrization. In our setting, we denote this matrix by . It has rows, indexed by the model parameters and . Its columns are indexed by the relevant Möbius coordinates. To be precise, the columns of are the - exponent vectors of the relevant monomials seen on the right in (6). The convex hull of these vectors is the model polytope . This is a full-dimensional subpolytope of the product of simplices that is associated with .
Example 6 (Binary -cycle).
Example 7 (Binary matroid models).
Consider any matroid on , and let be its complex of independent sets. Then is the independent set polytope of the matroid.
4. Ideal Generators
We are interested in the toric ideal of the marginal independence model in Möbius coordinates . For binary models arising from matroids (Example 7), we conjecture that is generated by quadrics. This is closely related to a famous conjecture due to Neil White (MS, , Conjecture 13.16) which asserts quadratic generation for the toric ideal of the matroid base polytope. We here consider the polytope of all independent sets. White’s conjecture has been proved for many special cases, e.g. for graphic matroids by Blasiak (Blasiak, ), for rank three matroids by Kashiwabara (Kashi, ), and up to saturation in (LM, ).
For our binary matroid models, the toric ideal is normal (the proof is analogous to the base polytope case; see (MS, , Theorem 13.8)) and hence it is Cohen-Macaulay. In general, the Cohen-Macaulay property for simplicial complexes was asserted in (Kir, , Conjecture 28), along with the expected dimension for . We proved the second part of Kirkup’s conjecture in Corollary 5, but the first part remains open. Presently, we know no ideal that fails to be Cohen-Macaulay. In particular, we do not yet know how to transfer the counterexamples in (BTO, ) to our setting. This issue is subtle because non-normal models exist (see (MOS, ) and Remark 2).
Our next result states that the toric ideals can have minimal generators of arbitrarily high degree. This is based on a construction that extends the bowtie in Example 5.
Theorem 1 ().
Let be the -dimensional simplicial complex associated with the graph
Then the toric ideal for the binary model has a minimal generator of degree .
Consider the following binomial in the Möbius coordinates associated with the edges:
The binomial has degree . Each of its two monomials uses the indices twice, and it uses the indices once. Hence vanishes under the monomial map in (6), which sends to , and this means that lies in the toric ideal . We claim that occurs in every minimal generating set of . In the language of algebraic statistics (AHT, ), is indispensable for the Markov basis of , i.e. the corresponding fiber of the nonnegative integer linear map given by has only two elements.
This can be shown by means of a hypergraph technique. Namely, since is , it is an incidence matrix of the parameter hypergraph of the model. Defined in (PS, , Section 3.1), vertices of are , and edges are of size , , and , representing monomial images of , , and , respectively. A pair of edge multisets in is said to be balanced if for each vertex . Denote by a binomial supported on ; by (GrPe, , Proposition 3.1), every binomial in is of this form. (GrPe, , Proposition 4.1) states that if there exists no splitting set of the balanced edge set supporting the binomial, then the binomial is indispensable. A splitting set of a is a set of edges in the hypergraph such that can be decomposed into two other balanced sets overlapping on , a condition that in translates to being generated by smaller binomials supported on subsets and . Precisely, ; ; and the two new sets respect coloring: , and similarly for blue.
For our , a splitting set must correspond to proper faces of , and due to the color-balancing requirement, the only choices that could split are the middle vertices of the graph, that is, the size-2 edges for in the hypergraph. If any hypergraph edge , then represents the collection to the left of vertex . Then in , requiring the addition of the singleton to . In Möbius coordinates, this means is required in the support to make the binomial supported on homogeneous and thus living in . But this addition violates the splitting set definition, and as it is the only way to color-balance, a splitting set does not exist. ∎
Remark 2 ().
For , both terms of the indispensable binomial contain the square of a coordinate . From this one can infer that is not normal. We refer to (MOS, ) for a systematic construction of simplicial complexes whose binary model is not normal.
Here is another example with a high-degree generator which we found quite interesting.
Example 3 (Sextic for ).
Fix the binary model for the -dimensional complex with facets , , , , , , . The ideal is minimally generated by quadrics and one sextic, namely . All quadrics are squarefree.
We next offer a brief comparison to the graphical set-up of Drton and Richardson in (DR, ).
Example 4 ().
We fix the bidirected graph in the article (DR, ) to be the -cycle. This represents the model defined by . Its ideal is also toric in Möbius coordinates, by (DR, , Theorem 1). A saturation step reveals that it is generated by quadrics:
That bidirected -cycle model has dimension and degree . In the simplex , it is naturally sandwiched between two of our models. It contains for , which has dimension , degree , and quadrics. And, it is contained in the model for , which has dimension , degree , and quadrics.
The original motivation for this project came from algebraic models for random graphs. Marginal independence for binary random variables offers a natural class of such models. Let and consider undirected simple graphs on vertices, with edges labeled by . These graphs are in bijection with subsets of
. Probability distributions on graphs are points in the simplex. Example 7 introduces a new model for marginal independence of edges in a graph. This model is associated with the binary graphic matroid of the complete graph , and is the simplicial complex of all forests.
Example 5 (Random graph model).
We consider graphs on nodes . The possible edges are labeled . The associated simplicial complex is -dimensional and it has simplices. These simplices are the forests with vertices . The facets of are the spanning trees in the complete graph . Explicitly, they are the triples in other than the cycles . Each of the graphs with vertex set has a certain probability , according to our model. Being in means that the edges in any spanning tree are chosen independently. However, choices of edges are no longer independent when a cycle is formed.
The random graph model has dimension and lives in . In the Möbius coordinates, it is given by a toric ideal of codimension and degree . Here is the independent set polytope of the graphic matroid of , which has dimension and volume . The ideal is generated by quadratic binomials, including
Each of these is a quadratic constraint in the probability coordinates . These are the probabilities of the graphs.
5. Parameter Estimation
Estimating model parameters from data is a fundamental task in statistics. In our setting, the data is a tensor of format whose entries are nonnegative integers, indicating the number of times each given joint state was observed. The empirical distribution is a rational point in . We seek a distribution in the model that best explains the data, in an appropriate statistical sense. We wish to estimate the model parameters that map to .
We examine three different paradigms for parameter estimation. These are Maximum Likelihood (ML) degree, and affine (aED) and projective (pED) Euclidean distance degree:
All three optimization problems are meaningful for data analysis, and also for algebraic geometry. Recall that is an affine cone in , encoding a projective variety in . Problem (pED) asks for the point in that affine cone which is closest to . In the algebraic approach we compute all critical points in . In problem (aED) we add the further constraint that the tensor entries must sum to
. Thus (aED) refers to the Euclidean distance problem for an affine variety in the hyperplanenamely the Zariski closure of the statistical model obtained by intersecting the cone with the simplex . The subtle distinction between the projective case and the affine case is discussed in (EDdeg, , Section 6).
Problem (ML) is likelihood inference, which is ubiquitous in statistics. For a geometric introduction see (LikeGeo, ). The log-likelihood function is invariant, up to an additive constant, under scaling the tensor , and there is no need to distinguish between affine and projective.
The intrinsic algebraic complexity of a polynomial optimization problem is the number of complex critical points, assuming the data tensor is generic. For the problems above, that number is called the ML degree, the aED degree and the pED degree, as in (EDdeg, ; LikeGeo, ).
Example 1 ().
We fix three binary random variables. The Segre variety is the set of tensors of rank . This variety has ML degree and aED degree . Its pED degree is , by (EDdeg, , Example 8.2). The three solutions for the data are displayed in Table 1. The last row shows the rational numbers . The solutions for aED and ML sum to , but for pED it does not. For pED there are three other real critical points.
The three solutions are close to each other in . However, for an algebraist they are very different. To appreciate this, consider the respective minimal polynomials for the solution coordinate in Figure 1. All three polynomials are irreducible in , but their size varies considerably.
We performed this computation for a wide range of marginal independence models . In what follows we discuss our methodology. Our findings are summarized in the next section, along with pointers to a comprehensive database.
We begin with the Euclidean distance problems (aED) and (pED). Our varieties are toric and hence rational, given both parametrically, by Lemma 1, and via an implicit representation, by Theorem 4. The former leads to an unconstrained optimization problem whose decision variables are the model parameters . Its critical equations are simply those in (EDdeg, , equation (2.4)). The latter leads to a constrained optimization problem whose decision variables are the tensor entries ; the critical equations are given in (EDdeg, , equation (2.1)).
We solve the unconstrained problem using the julia package HomotopyContinuation.jl due to Breiding and Timme (HomotopyContinuation, ). To illustrate this package, we show the code for :
using HomotopyContinuation @var q1 q2 q3 q12 q13 q23 q123 z u000, u001, u010, u011, u100, u101, u110, u111 = [ 1//2, 1//4, 1//8, 1//16, 1//32, 1//64, 1//128, 1//128 ] diffs = [u000 - z*(q123), u001 - z*(q12-q123), u010 - z*(q13-q123), u011 - z*(q1-q12-q13+q123), u100 - z*(q23-q123), u101 - z*(q2-q12-q23+q123), u110 - z*(q3-q13-q23+q123), u111 - z*(1-q1-q2+q12-q3+q13+q23-q123)] dist = sum([d^2 for d in diffs])
This sets up the objective function dist in Möbius parameters. We next specify the model:
model = [q12=>q1*q2, q13=>q1*q3, q23=>q2*q3, q123=>q1*q2*q3] dist = subs(dist, model...) # projective ED vars = variables(dist) eqns = differentiate(dist, vars) R = solve(eqns)
This solves the pED problem in Example 1. If we now run C = certify(eqns,R), then this proves correctness, in the sense of (BRT, ). By deleting entries of model, we can specify other complexes . For instance, for the 3-cycle, delete q123=>q1*q2*q3. For aED we use the line
model = [q12=>q1*q2, q13=>q1*q3, q23=>q2*q3, q123=>q1*q2*q3, z=>1] # affine ED
Results of this computation for all binary models up to are presented in Section 6. Symbolic computation was used to independently verify a range of small cases.
Our experiments suggest the following conjecture for all marginal independence models:
Conjecture 2 ().
Given any nonnegative tensor whose entries sum to , the aED problem has precisely one real critical point, namely the point that is closest to .
For maximum likelihood estimation, most statisticians use local hill-climbing methods, such as Iterative Conditional Fitting (DR, , Section 5). By contrast, we here use the global numerical tools of nonlinear algebra (HomotopyContinuation, ; BRT, ). The objective function for ML is entered as follows:
p = [z*(q123), z*(q12-q123), z*(q13-q123), z*(q1-q12-q13+q123), z*(q23-q123), z*(q2-q12-q23+q123), z*(q3-q13-q23+q123), z*(1-q1-q2+q12-q3+q13+q23-q123)] loglike = sum([u[i] * log(p[i]) for i = 1:length(u)])
Following (ST, ), we solve the rational function equations given by the gradient of loglike. We avoid the numerator polynomials, which are much too large. The introduction of (ST, ) discusses numerical ML by stating that “a key idea is to refrain from clearing denominators”.
Example 1 suggests that the ED degrees exceed the ML degree. However, this is incorrect. The Segre variety is an exception. Almost all models have a larger ML degree.
Example 3 ().
Fix the bowtie in Example 5 and consider a data tensor . We estimate the model parameters, in order to find the best fit . Solving the critical equations for Euclidean distance is much faster than for maximum likelihood. We see this from the degrees, which reveal the number of paths to be tracked in HomotopyContinuation.jl. The aED degree equals and the pED degree equals . We do not yet know the ML degree, but monodromy loops show that it exceeds one million. An easier model is the claw graph , whose ML degree equals . The aED degree and the pED degree are only and , respectively.
Remark 4 ().