    # Brownian motion tree models are toric

Felsenstein's classical model for Gaussian distributions on a phylogenetic tree is shown to be a toric variety in the space of concentration matrices. We present an exact semialgebraic characterization of this model, and we demonstrate how the toric structure leads to exact methods for maximum likelihood estimation. Our results also give new insights into the geometry of ultrametric matrices.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Brownian motion tree models are classical statistical models for phylogenetic trees. They were introduced by Felsenstein [Fel73]

to examine continuous measurements of phenotypes in evolutionary biology. The vertices of the tree represent real-valued random variables, whose joint distribution obeys a Gaussian law.

Let be a tree with leaves, labelled , and with no vertices of degree two. Let be the rooted tree obtained from by directing all edges away from . The set of non-root vertices of is in natural bijection with the set of edges of . A vertex is a descendant of if there is a directed path from to . The set of all leaves of that are descendants of is denoted by . We fix a total order on such that if . Given , we write for their most recent common ancestor. Figure 1 shows our running example. Figure 1. A tree T with n=4 leaves, |V|=7 edges, and its matrix representation.

In the space of symmetric matrices we consider the subspace

 LT={Σ∈Sn:σij=σkl if lca(i,j)=lca(k,l)}.

Using parameters , the matrices in satisfy for . This furnishes a representation of the tree by a matrix, as shown in Figure 1.

We are interested in Gaussian distributions on with covariance matrix in . Their concentration matrices form the -dimensional algebraic variety

 L−1T={K=Σ−1:Σ∈LT}⊂Sn.

We identify with its Zariski closure in the projective space . In this paper we show that the variety is linearly isomorphic to a toric variety in . In tropical geometry [MS15, Remark 4.3.11] and algebraic combinatorics [BFF18, Theorem 4.6], one associates a toric ideal with the unrooted tree as follows. The ideal has the quadratic generators where and are cherries in the induced -leaf subtree on any quadruple .

To reveal the toric structure, we introduce a change of coordinates in  as follows:

 (1) pij=−κijfor1≤i

With this, the concentration matrix is the reduced Laplacian of the complete graph on vertices with edge labels . See [MSUZ16, Example 4.9], where the matrix for is shown in equation (4.6). Here is the same scenario for :

###### Example 1.1.

We fix coordinates on by setting

 K=⎡⎢ ⎢ ⎢ ⎢⎣p01+p12+p13+p14−p12−p13−p14−p12p02+p12+p23+p24−p23−p24−p13−p23p03+p13+p23+p34−p34−p14−p24−p34p04+p14+p24+p34⎤⎥ ⎥ ⎥ ⎥⎦.

Fix the tree in Figure 1. The -dimensional toric variety in is defined by

 I~T=⟨p01p23−p02p13,p01p24−p02p14,p03p14−p04p13,p03p24−p04p23,p13p24−p14p23⟩.

These quadrics vanish for the inverse of any matrix with the structure in Figure 1. ∎

The title of this paper is an abridged version of the following statement:

###### Theorem 1.2.

The variety of concentration matrices in the Brownian motion tree model, in coordinates (1), coincides with the toric variety defined by the ideal .

The proof of this theorem will be given in Section 3. First, however, in Section 2, we offer an introduction to the statistical model and its phylogenetic applications. Our statistical models correspond to semialgebraic subsets of or . We are interested in two subsets of , namely the spectrahedron , obtained by intersection with the cone of positive definite matrices, and the polyhedral cone

 LT,≥= {Σ∈LT:0≤σij≤σkl wheneverlca(i,j)≤lca(k,l)}.

We shall see that is a simplicial cone, contained in the spectrahedron .

Matrices in play an important role in statistics. By Proposition 3.14 in [DMSM14], every matrix in is an ultrametric matrix in , i.e. it satisfies for all . By Theorem 3.16, every ultrametric matrix lies in for some tree

. Ultrametric matrices appear in the potential theory of finite state Markov chains, which is the context of

[DMSM14]. Our motivation came from phylogenetics [Fel73] and Gaussian maximum likelihood estimation [ZUR17].

Every matrix in represents a Gaussian distribution on . Both and belong to the class of linear Gaussian covariance models [And70, ZUR17].

The main result of this paper is Theorem 2.6. This is an extension of Theorem 1.2 which features toric inequalities in addition to the quadratic binomial equations in . It offers an exact semialgebraic description of the model in terms of the nonnegative coordinates . The proof of this result is presented in Section 5. It rests on formulas that express in terms of treks as in [STD10].

Section 4 is about fitting Brownian motion tree models to data, given by a sample covariance matrix in . We do so by maximizing the log-likelihood function

 (2) ℓ(Σ)=−logdetΣ−trace(SΣ−1).

This function is non-convex. The expression in terms of equals

 (3) ℓ(K)=logdetK−trace(SK).

This function is convex in , which motivates analyzing maximum likelihood estimation for Brownian motion tree models as an optimization problem over . As we will show in Section 4, in this parameterization maximum likelihood estimation boils down to solving a system of polynomial equations on . The paper concludes with a brief discussion on how Theorem 2.6 might be applied to likelihood inference.

## 2. Tree models and their parameters

Brownian motion is a stochastic process that characterizes the random motion of particles. It is a Wiener process satisfying , with independent increments, and such that for

has a Gaussian distribution with mean zero and variance

. Brownian motion on a rooted binary tree can also be described using the Wiener process. The process starts at vertex . At time , it splits into two, and each of the two processes starts evolving independently at value . It again proceeds according to the Wiener process until another splitting event occurs. We think about this process as evolving along , where the parameters for inner vertices represent the times of splitting events. This construction is a continuous interpretation of the Gaussian structural equation model (4) discussed next.

Given a rooted tree , we define a Gaussian distribution on as follows. First, set . Then to each vertex we associate independently a Gaussian random variable with mean zero and variance . The corresponding Markov process on is a collection of real-valued random variables for . They satisfy

 (4) Yv=Yu+ϵvfor every edge u→v∈E.

Since a linear transformation of a Gaussian vector is also Gaussian, we conclude that the random vector

is Gaussian. The set of covariance matrices of the marginal distributions on the leaf-variables is the polyhedral cone .

###### Proposition 2.1.

The random vector

is normally distributed with mean zero, and the entries

of its covariance matrix are

 (5) σij=∑v≤lca(i,j)θvfori,j=1,…,n.

The resulting Gaussians on are precisely those with covariance matrices in .

###### Proof..

Using (4) recursively, we can write each in terms of the error terms as

 Yi=∑v≤iϵv.

Equation (5) follows from this and the fact that all ’s are mutually independent. The linear inequalities that define the polyhedral cone inside the linear space are equivalent to the requirement that the ’s be nonnegative. ∎

###### Example 2.2.

Consider the tree in Figure 1. The random variables for the inner vertices of the tree are , , , , and we have for the leaves.

The are independent univariate Gaussians with mean and variance . Hence the marginal distribution of is Gaussian with the covariance matrix

 (6) Σθ=⎡⎢ ⎢ ⎢⎣θ1+θ5+θ7θ5+θ7θ7θ7θ5+θ7θ2+θ5+θ7θ7θ7θ7θ7θ3+θ6+θ7θ6+θ7θ7θ7θ6+θ7θ4+θ6+θ7⎤⎥ ⎥ ⎥⎦.

This is the matrix in (5) and in Figure 1. The constraint that the are nonnegative translates into the inequalities and and . ∎

The extreme rays of the polyhedral cone are as follows. Let be the vector with if and otherwise. The corresponding rank one matrices form a basis for . In fact, the matrix in (5) equals

 (7) Σ=∑v∈VθvGv.
###### Corollary 2.3.

The cone is a simplicial cone, spanned by the rank one matrices associated with vertices . It is contained in the spectrahedral cone .

Note that this inclusion is strict. For instance, the matrix in (6) is positive definite if we set , and . This means that the linear covariance model is strictly larger than the Brownian motion tree model.

We next interpret our model in the context of distance-based phylogenetics. Using the natural bijection between non-root vertices and edges, we label each edge of with a parameter . This is shown in the tree on the right in Figure 1. We think of as the length of the associated edge. We compute the distance between any two leaves of by summing the lengths of edges on the unique path joining them. The collection of resulting distances for is a tree metric on .

The correspondence between ultrametric matrices and tree metrics on taxa is known in phylogenetics as the Farris transform. The formulas are

 {σii=d0ifor 1≤i≤nσij=12(d0i+d0j−dij)for 1≤i

and these are equivalent to (5). The inverse of the Farris transform is given by

 {d0i=σiifor 1≤i≤ndij=σii+σjj−2σijfor 1≤i≤j≤n.
###### Proposition 2.4.

The model is identified with the cone of tree metrics on via the Farris transform . The parameters are the lengths of the edges.

###### Proof..

The diagonal entry of the covariance matrix is the sum of the lengths of the incoming edges for all vertices on the path from the root to leaf . Therefore, is the distance from to in the unrooted tree . Each off-diagonal entry is the length of the path from the root to . Hence is the length of the path from to the leaf . We conclude that is the length of the path from leaf to leaf in . Since the Farris transform is an invertible linear transformation, it identifies the two simplicial cones in . ∎

We next turn to the space of all tree metrics, which is a key object in phylogenetics. A classical result of Buneman [Bun71] states that a metric on is a tree metric (for some tree) if and only if it satisfies the four point condition:

 (8) dij+dkl≤max{dik+djl,dil+djk}for all i,j,k,l∈{0,1,…,n}.

If is a tree metric on then the following additional equation holds:

 (9) dik+djl=dil+djkif {i,j},{k,l} are % cherries in the quartet on i,j,k,l.

The constraints (8) and (9) are well-known also in tropical geometry [MS15, §4.3] where one identifies the space of tree metrics with the tropical Grassmannian that parametrizes tropical lines in . This is related to Theorem 1.2 as follows.

###### Remark 2.5.

If we set then the linear relations (9) that hold for tree metrics on are precisely the equations that define the toric ideal .

We now state our main result. It augments Theorem 1.2 by incorporating the inequalities in (8). The unrooted tree obtained from by restricting to any four leaves is called a quartet of . If equality holds in (8) then this four-leaf tree is a star quartet. If the inequality in (8) is strict then we call it a trivalent quartet.

###### Theorem 2.6.

Given any rooted tree , the set of concentration matrices in the Brownian motion tree model is the set of positive definite matrices satisfying

 (10) pij≥0for all0≤i
###### Remark 2.7.

These inequalities are satisfied by where is any tree metric on . Thus, the set of models , where ranges over all rooted trees on leaves, is a multiplicative realization of the space of phylogenetic trees. This is reminiscent of the space of phylogenetic oranges studied by Moulton and Steel [MS04].

We illustrate the contents of Theorem 2.6 for our running example.

###### Example 2.8.

Fix the tree in Figure 1 with covariance matrix in (6). Set . Writing the concentration matrix as in Example 1.1, we have

 p13s=θ2θ4θ7,p14s=θ2θ3θ7,p23s=θ1θ4θ7,p24s=θ1θ3θ7,(p03p12−p02p13)s=θ4θ5,(p04p12−p02p14)s=θ3θ5,(p01p34−p04p13)s=θ2θ6,(p02p34−p04p23)s=θ1θ6,(p12p34−p14p23)s=θ5θ6+θ5θ7+θ6θ7,p01s=(θ3θ4+θ3θ6+θ4θ6)θ2,p02s=(θ3θ4+θ3θ6+θ4θ6)θ1,p03s=(θ1θ2+θ1θ5+θ2θ5)θ4,p04s=(θ1θ2+θ1θ5+θ2θ5)θ3,p12s=θ3θ4θ5+θ3θ4θ7+θ3θ5θ6+θ3θ5θ7+θ3θ6θ7+θ4θ5θ6+θ4θ5θ7+θ4θ6θ7,p34s=θ1θ2θ6+θ1θ2θ7+θ1θ5θ6+θ1θ5θ7+θ1θ6θ7+θ2θ5θ6+θ2θ5θ7+θ2θ6θ7.

The five quadratic binomials in are zero for these . Assuming this, Theorem 2.6 says that these expressions are nonnegative if and only if .

## 3. Toric ideals from trees

In this section we prove Theorem 1.2. The proof of Theorem 2.6 is given in Section 5. The following code in Macaulay2 [M2] provides the quadratic generators for our running example. It also shows that the rooted tree need not be binary.

###### Example 3.1.

Example 1.1 can be verified in Macaulay2 [M2] by running this code:

R = QQ[t1,t2,t3,t4,t5,t6,t7,p01,p02,p03,p04,p12,p13,p14,p23,p24,p34];
S = matrix {{t1,t5,t7,t7},
{t5,t2,t7,t7},
{t7,t7,t3,t6},
{t7,t7,t6,t4}};
K = matrix {{p01+p12+p13+p14, -p12, -p13, -p14},
{-p12, p02+p12+p23+p24, -p23, -p24},
{-p13, -p23, p03+p13+p23+p34, -p34},
{-p14, -p24, -p34, p04+p14+p24+p34}};
id4 = matrix {{1,0,0,0},{0,1,0,0},{0,0,1,0},{0,0,0,1}};
I = eliminate({t1,t2,t3,t4,t5,t6,t7},minors(1,S*K-id4))
codim I, degree I, betti mingens I


As claimed, the toric ideal has codimension , degree and five quadratic generators.

We now examine non-binary trees. First we replace the two occurrences of t6 by t7 in the covariance matrix S. The resulting tree has . By running the modified Macaulay2 code, we see that the ideal is still toric. It has codimension , degree and 7 quadratic generators. Finally, we replace both t5 and t6 with t7. Now the unrooted tree has . It is the star tree with leaves . Its toric ideal is the ideal of the second hypersimplex. It has codimension and degree , with 10 quadratic generators. Modifying the code confirms these data. ∎

###### Proof of Theorem 1.2.

We use the following parametric representation for the toric variety of the ideal associated with the unrooted tree . It is given by Laurent monomials in the entries of the matrix representation of the rooted tree :

 (11) pij↦tlca(i,j)/(titj)for1≤i

The ideal is the kernel of the ring homomorphism given by (11).

The variety is a cone in given parametrically by mapping a covariance matrix to its inverse . Since the parametrization is homogeneous, we may replace the inverse by the adjoint. By slight abuse of notation we set . The entries of the matrix are homogeneous polynomials of degree in the parameters for . The same holds for the coordinates in (1). We write for these homogeneous polynomials. Our claim states that the toric ideal coincides with the kernel of the ring homomorphism .

To prove this, we examine the initial monomials and the irreducible factorization of the polynomials . Here we fix the degree reverse lexicographic order on given by if in . For , the polynomial is equal (up to sign) to the determinant of the submatrix of that is obtained by deleting row and column . The initial monomial is the product of the entries of that submatrix which appear along the main diagonal. To be precise, we find

 in(Pij(t))=t1t2⋯tn⋅tlca(i,j)/(titj).

The polynomial is the determinant of the matrix obtained from  by replacing the th row with the all-ones vector . Its initial monomial equals

 in(P0i(t))=t1t2⋯tn⋅(1/ti).

Hence, by (11), the relations among the initial monomials are precisely given by . We claim that each of the quadratic binomial relations among the above Laurent monomials lifts to exactly the same relation among the full polynomials and . We shall prove this by examining the factorizations of these polynomials.

In what follows we first assume that is a binary tree, i.e. every vertex in has precisely two children in . At the end of the proof, we shall derive Theorem 1.2 for non-binary trees from the same statement for binary trees.

For any inner vertex in the rooted binary tree , let denote the rooted tree obtained from by deleting all edges and vertices below . Thus is a rooted tree with leaves . Let denote the determinant of its covariance matrix. This is a homogeneous polynomial of degree . For any directed edge of the tree , we consider the submatrix of with row indices and column indices , for any fixed . This matrix does not depend on , and it has one more column than rows. We make it square by placing the all-ones vector into the first row. We write for the determinant of that square matrix. This is a homogeneous polynomial in of degree . By convention, for the root edge .

Consider the path between any two leaves and in the unrooted tree . Each vertex in the interior of such a path has a unique child in the rooted tree that is not on the path. Here we are using the assumption that is a binary tree. The only exception is the top vertex on the path between and in .

We find that the polynomial is equal to the product of all determinants where is any edge on the path from to . Similarly, the determinant is equal to times the product of all where the vertex is on the path from leaf to leaf . One verifies this by examining for which parameter values these expressions vanish, and by noting that the initial monomials coincide with the products of the initial monomials of the factors:

 in(Dk(t))=tk⋅∏{ti:i∈{1,…,n}∖de(k)},in(Euv(t))=∏{ti:i∈de(v)}.

The above factorizations of and into the determinants and show that each generator of vanishes on our variety. By our analysis of the leading monomials, there are no relations among the polynomials and beyond those in . In fact, our analysis shows that these polynomials form a Khovanskii basis (cf. [KM19]) for the reverse lexicographic monomial order on the .

We now know that Theorem 1.2 holds for all binary trees. It remains to derive from this the same statement for all non-binary trees. The property for rooted trees to be binary translates into the property for unrooted trees to be trivalent. Let be any non-trivalent tree and let be the set of all trivalent trees that are obtained by refining . One verifies that the following identity among toric ideals holds:

 (12) I~T=∑~U∈[~T]I~U.

Similarly, the linear space is the intersection of all the linear spaces , where runs over . Since matrix inversion is a birational isomorphism, the variety is the intersection of the toric varieties where runs over the trivalent trees in . The Nullstellensatz implies that the sum of toric ideals (12) cuts out set-theoretically. This shows that is a toric variety, with toric ideal in (12). ∎

###### Example 3.2.

Consider the binary tree in Figure 1 and Examples 1.1 and 3.1. The special determinants defined above are the following polynomials:

 E51=t2−t5,E52=t1−t5,E63=t4−t6,E64=t3−t6,
 E75=det⎡⎢⎣111t7t3t6t7t6t4⎤⎥⎦,E76=det⎡⎢⎣111t1t5t7t5t2t7⎤⎥⎦.

We are interested in the projective variety in that is parametrized by

 p01=E75E51,p02=E75E52,p03=E76E63,p04=E76E64,p12=D5,p34=D6,p13=E51D7E63,p14=E51D7E64,p23=E52D7E63,p24=E52D7E64.

One verifies that this is the variety defined by the toric ideal seen in Example 1.1. Furthermore, the same toric variety is also parametrized by the initial monomials , and . ∎

###### Remark 3.3.

Tropical geometers know that the toric ideals are precisely the monomial-free initial ideals of the Plücker ideal that defines the Grassmannian of lines. The latter arises in a manner that is similar to our passage from covariance matrices to concentration matrices, namely by inverting matrices that have a Hankel structure. This is the content of [MSUZ16, Proposition 7.2]. We do not know whether this is related to the present paper. Is it possible to derive Theorem 1.2 by a degeneration argument from the relationship between Hankel matrices and Bézout matrices ?

## 4. Maximum likelihood algebra

The log-likelihood function for Gaussian random variables is the function in (2). Here is a fixed sample covariance matrix, i.e.  where is a real matrix whose columns are the observed samples. Maximum likelihood estimation is concerned with maximizing the expression (2) over all covariance matrices in the model of interest. This optimization problem is equivalent to maximizing the expression (3) over all concentration matrices in the model.

The optimal solution to this problem is denoted by or . This is called the maximum likelihood estimate (MLE) for the data . Here the model is fixed but the data can vary. We therefore think of the MLE as a function of .

In this section we study the MLE for the Brownian motion tree model . The idea is to take advantage of the toric structure revealed in Theorem 1.2. Thus, we use the coordinate change (1) that writes the concentration matrix as the reduced Laplacian for the complete graph on vertices with edge labels . With this, the expression (3) is a function of the , subject to the toric constraints in . This gives us the flexibility to choose a convenient parametrization of the toric variety.

In algebraic statistics, one distinguishes two kinds of polynomial constraints for a statistical model, namely equations and inequalities. It is customary to first focus on the equations and examine the MLE in that setting before incorporating inequalities.

In our paper, the model is given by the semialgebraic set . This set satisfies the inequalities in Theorem 2.6. For the discussion of MLE in the current section, we ignore the inequality constraints and identify the set with its Zariski closure, which is the toric variety . The critical points of the likelihood function on that variety are defined by a system of polynomial equations, known in statistics as the likelihood equations. These can be derived by using Lagrange multipliers, or via a monomial parametrization of the toric variety .

The maximum likelihood degree of the model is, by definition, the number of complex solutions to the likelihood equations for generic data . This number is an algebraic invariant of the ideal . To compute it we take to be a general symmetric matrix of full rank and we count all complex critical points of .

###### Proposition 4.1.

The maximum likelihood degree of the Brownian motion tree model on a binary tree with leaves is equal to .

###### Proof..

This result was found by symbolic computation, namely using the Gröbner basis package in the computer algebra system maple. For the computation was carried out over a finite field. All combinatorial types of trees were considered. See Example 4.4 for an illustration of the case where the ML degree is . ∎

This result is complementary to the usual approach in computational statistics where one maximizes the likelihood function using a local numerical method, such as the Newton-Raphson algorithm. Local methods perform best in a regime where the likelihood function is concave. Such a regime was identified in [ZUR17]

, where concavity was shown to hold with high probability when the dimension

is small relative to the sample size . In that analysis it was essential to use all constraints of the model, i.e., not just the equations but also the inequalities.

The maximum likelihood degree being equal to one means that the MLE can be written as a rational function of the data. Proposition 4.1 says that this happens for our model when and . We next present the formulas for these two cases.

###### Example 4.2 (n=2).

The toric ideal equals , so our model is the full Gaussian family. This means that the MLE equals the sample covariance matrix:

 ^σ11=s11,^σ12=s12,^σ22=s22.

Since the MLE of the parameters is , , , this leads to valid parameters for the Brownian motion tree model if . ∎

###### Example 4.3 (n=3).

We label the rooted tree so that is a clade. Hence and are the cherries in the unrooted tree . Our toric ideal is principal:

 I~T=⟨p01p23−p02p13⟩=⟨κ11κ23−κ12κ13+κ12κ23−κ13κ22⟩.

This is equivalent to setting in the covariance matrix . The MLE is a rational function of the entries of the sample covariance matrix . We define

 c=(s11−2s12+s22)s33−(s13−s23)2.

The entries of the estimated covariance matrix satisfy and

 ^σ11=s11−2(s13−s23)(s11s33−s12s33−s213+s13s23)(s11s23−s12s13−s12s23+s13s22)/c2,^σ12=s12−(s13−s23)(s11s33−s213−s22s33+s223)(s11s23−s12s13−s12s23+s13s22)/c2,^σ22=s22−2(s13−s23)(s12s33−s13s23−s22s33+s223)(s11s23−s12s13−s12s23+s13s22)/c2.

The remaining two matrix entries must be equal:

 ^σ13=s13−(s13−s23)(s11s33−s213−s12s33+s13s23)/c=^σ23=s23−(s23−s13)(s22s33−s223−s12s33+s13s23)/c.

The following two linear forms are preserved when passing from data to MLE:

 ^σ11−2^σ12+^σ22=s11−2s12+s22and^σ33=s33.

Writing for the sample concentration matrix, we note that is a rank matrix which depends only on , , and . Also,

###### Example 4.4 (n=4).

We consider the tree in Figure 1. Its toric variety was discussed in Examples 1.1, 3.1 and 3.2. We shall prove that the MLE for this model cannot be expressed in radicals. For this, we fix the parametrization

 (13) p01=u1,p02=u2,p03=u3,p04=u4,p12=u1u2u6,p13=u1u3u5,p14