Staged tree models are discrete statistical models introduced by Smith and Anderson in 2008 [Smith.Anderson.2008] that record conditional independence relationships among certain events. They are realisable as rooted trees with coloured vertices and labelled edges directed away from the root, called staged trees. Vertices in a staged tree represent events, edge labels represent conditional probabilities, and the colours on the vertices represent an equivalence relation—vertices of the same colour have the same outgoing edge labels. We use to denote the label associated to an edge . The sum of the labels of all edges emanating from the same vertex in a staged tree must be equal to one. The staged tree model itself is then defined as the set of points in parametrised by multiplying edge labels along the root-to-leaf paths in the staged tree :
where is the set of edges which emanate from a vertex , and is the set of edges of the root-to-leaf path, . The local sum-to-one conditions ensure that the multiplication rule of edge labels along root-to-leaf paths in
induces a well defined probability distribution. The visual presentation via a coloured tree has been the key to the staged tree models’ increasing popularity in both algebra[DG, AD, duarte2021discrete, Duarte.Solus.2021] and statistics [Keeble.etal.2017, Goergen.etal.2020, rpackage, Leonelli.Varando.2021, Genewein.etal.2021].
By the description Eq. 1, a staged tree model can be interpreted as the variety of the kernel of a map , formally introduced in Eq. 3, intersected with the probability simplex. We say that a staged tree model has toric structure if the variety of is a toric variety. This is equivalent to being a binomial ideal, possibly after an appropriate linear change of variables. Duarte and Görgen [DG] show that when the tree is balanced, one can safely ignore the sum-to-one conditions, and the ideal
is binomial. This class has wide intersections with Bayesian networks/directed graphical models and hierarchical models. The majority of staged trees, however, are not balanced. This is true even for the small portion of staged trees which are also Bayesian networks, making the algebraic study of these models a new and necessary task.
Non-balanced staged trees appear naturally in applications. The most basic example is one of flipping a biased coin with probability for heads, and for tails. If it shows heads, flip it again. The graph in Fig. 1 is a staged tree presentation of this experiment.
The points in the corresponding model have the parametrisation for positive that sum up to one. They are the solution set inside the probability simplex to the non-binomial equation in the polynomial ring . Contrary to the balanced case, as the example suggests, the ideals of non-balanced staged trees are not generated by binomials; their generating sets are often complicated.
The present paper takes a large leap and proves that in many cases non-balanced staged tree models have also a toric structure; one needs to find an appropriate linear change of variables to reveal this friendly structure. For instance, the linear change of coordinates , and in the example above bijectively transforms the defining variety into the set of points that satisfy the binomial equation .
Why is toric a desirable property? For one, the literature on the algebra of toric varieties is already very rich and with close connections to polyhedral geometry and combinatorics. Most importantly, toric ideals have shown to be particularly useful in algebraic statistics since with the very first works [diaconis1998algebraic, Pistone.etal.2001, Geiger.etal.2006] in this area. For instance, generating sets of toric varieties produce Markov bases and contribute in hypothesis testing algorithms [diaconis1998algebraic], toric varieties are intrinsically linked to smoothness criteria in exponential families [Geiger.etal.2001]
, and the polytope associated to a toric model is useful when studying the existence of maximum likelihood estimates[fienberg2012maximum].
In previous work, the parametrisation defining a staged tree model Eq. 1 is often assumed to be squarefree. In other words, in the tree graph no two vertices are in the same stage along the same root-to-leaf path. The theory on staged tree models with a non-squarefree parametrisation is underdeveloped because they pose interpretational ambiguities in a statistical context [Collazo.etal.2018] and computational challenges in algebraic algorithms [Goergen.etal.2018]. However, as the example of the coin flip confirms, non-squarefree staged tree models are very natural, and both balanced and non-balanced staged trees can be non-squarefree. None of the results in our paper are restricted to squarefree staged tree models, which makes this paper the first to tackle equations defining non-squarefree staged tree models.
This article is organised as follows. In Section 2 we review the background literature on staged trees and introduce combinatorial and algebraic tools for studying these models. We discuss operations called swap and resize on a staged tree, which are used in Section 3 to prove that all balanced staged tree models have a quadratic Gröbner basis. It turns out that balanced staged trees carry properties that are central in commutative algebra, namely they are Koszul, normal, and Cohen-Macaulay. In Section 4 we prepare for our investigation of non-balanced trees with toric structure. Informally, the idea is to look for an ideal inside that is binomial after a linear change of variables, which can be used to detect a binomial structure of . This theory is then set to practice in Section 5 where we successfully extend the class of staged trees with toric structure. Our new class contains trees with a certain subtree-inclusion property, balanced trees, and combinations of both. In Section 6 we explore the structure of staged tree models where all vertices share the same colour. We connect these so-called one-stage trees to Veronese embeddings and show that all binary one-stage trees are toric. The paper ends in Section 7 with a discussion on further research directions, conjectures, and examples of staged trees for which we cannot currently decide whether or not they have toric structure.
2 Staged tree models
A discrete statistical model can be regarded as a subset of the open probability simplex
of dimension . Thus, given a staged tree , the staged tree model in Equation 1 defines a discrete statistical model. This section starts by studying the combinatorics of a staged tree, and then introduces the main algebraic object associated to .
2.1 Combinatorics of staged tree models
Given a staged tree let denote its vertex set and its edge set. The staged tree comes with an equivalence relation on its vertices. Formally, we say that two inner vertices and in are in the same stage, or share the same colour , if the labels of the outgoing edges of both vertices are the same. Leaves are by convention assumed to be trivially in the same stage since their outgoing edge sets are empty. For two vertices of different stages, the sets of labels of their outgoing edges are assumed to be disjoint. In depictions of staged trees we always assume that the outgoing edges of vertices with the same colour are ordered in the same way, such that their labels are pairwise identified from top to bottom. Figures 2 and 1 depict first examples.
For simplicity, leaves of a staged tree will be numbered and the root-to-leaf paths denoted . To each leaf an atomic probability is associated, . The notation is shorthand for the probability of passing through the vertex . Formally, is the sum of the atomic probabilities for which the root-to-leaf path passes through vertex . If is the child of , the label of the edge is equal to the fraction [Goergen.etal.2020]. As a consequence, the staged tree model can be described via equations , for all vertices . This characterisation motivates the definition of the ideal of model invariants studied in Section 4.
From now on, the edge labels will be considered as variables of a polynomial ring . For a vertex we denote by the induced subtree of which is rooted at and whose root-to-leaf paths are -to-leaf paths in . To this subtree we associate a polynomial which is equal to the sum of all products of labels along all -to-leaf paths in [Goergen.etal.2018]. This is a key object in our study of balanced staged trees. In particular, we call a staged tree balanced [DG] if for any two vertices in the same stage the following is true:
Figure 2 shows examples of balanced staged trees. This class is fairly large as all decomposable directed graphical models can be represented by balanced staged trees [DG].
Before we can extend the study of balanced staged trees in Section 3, we need to understand how this property of a tree affects the model it represents. To this end, note that a staged tree model most often is not uniquely represented by a single staged tree: there may be different parametrisations for the same model, giving rise to different polynomial equations cutting out the same subset of the probability simplex. The class of all staged trees representing the same model is known as the statistical equivalence class. Members of such a class are connected to each other by two simple graphical operations: resizes and swaps [Goergen.Smith.2018]. For the purposes of this paper we present versions of these in a constructive manner.
The swap operation is illustrated in Figs. 1(c) and 1(b) where we swap a two-level subtree. Let be the labels associated to a stage of colour . Suppose a vertex is not of the colour , but every term in the subtree polynomial is divisible by one of the . Then we can replace the subtree by another tree with the same atomic probabilities but with a root of colour . This is because in this setting every root-to-leaf path in goes through a vertex of colour and we can find a another subtree with root and leaves which have the colour in . Let denote the induced subtree coming after the edge labelled of the vertex . We obtain a different representation of like this. Start with a root of colour and to each edge glue a copy of . Then to each leaf glue the corresponding tree . The map which replaces the original tree by the tree obtained after performing this copying and reglueing is called a swap.
The resize operation is illustrated in Figs. 1(b) and 1(a) where we resize two two-level subtrees. Suppose in a staged tree we have a vertex such that for every vertex in the same stage, their children are also pairwise in the same stages, for all . Then for each such we remove their non-leaf children and draw edges from the vertices directly to their respective grandchildren. This introduces a new set of labels for the new edges related to the old ones as follows. Let be a vertex in the new tree, and the corresponding vertex in the original tree. Then if in we substitute each new label by the product of the two original that it replaced, we get . For the resized tree to represent the same model as the original tree we require in addition that no and are in the same stage for and these colours do not appear anywhere else in the tree. A resize not respecting these extra conditions were called ‘naïve’ in the original presentation [Goergen.Smith.2018]. But even in the ‘naïve’ cases where we do lose information and effectively create a bigger statistical model, these operations are useful tools for the analysis conducted in Section 5. In particular we have the following lemma.
The class of balanced staged trees is invariant under the swap and resize operations.
As for the swap, simply observe that this operation does not change the parametrisation of a staged tree but only locally changes the order of vertices in a subtree . Hence, is invariant under the swap, not affecting validity of Eq. .
For the resize, let two vertices be in the same stage in the new tree, after the resize operation has been applied. We want to prove that the condition Eq. is fulfilled, so that for all children of and , indexed by . What we do know is that is zero after we substitute a product of labels according to the resize operation, since the original tree was balanced. In other words, this binomial is an element of the ideal generated by , namely
where here denotes the set of labels from both the old tree and the new. But no ’s occur in the new tree, so we must have . This proves the claim. ∎
This result will provide the basis for proving creftype 3.4.
2.2 Algebra of staged tree models
We henceforth treat the atomic probabilities of a staged tree as variables spanning a polynomial ring denoted . We will use the short notation for the ideal of all local sum-to-one conditions in the polynomial ring . We choose as our base field in this paper in the style of real algebraic geometry and algebraic statistics, though all results hold for an arbitrary base field of characteristic zero. Associating the variables of to the atomic probabilities induces a ring map
We will now use the notation for the sum of the variables in the ring for which the corresponding root-to-leaf path passes through the vertex . Due to the sum-to-one conditions, the image of an atomic sum is equal to the product of the labels along the corresponding root-to- path.
The kernel of is the prime ideal of the staged tree . The variety of this ideal gives an implicit description to the staged tree model . In order to use classical results of algebraic geometry more comfortably, we will mostly work with the homogenised version of ,
where, in analogy to the notation used in Eq. 3, denotes the ideal of homogenised local sum-to-one conditions. Here is an artificial parameter, is the number of edges in the root-to-leaf path , and is the number of edges in the longest root-to-leaf path in the tree. The integer will often be regarded as the depth of the tree. Each of the variables are mapped to a monomial of degree under .
The homogenisation process is equivalent to completing the staged tree into a staged tree where all leaves are in the same distance from the root. This is achieved by adding when needed internal vertices of outdegree one and label : compare Fig. 4. Such changes do not affect the model. Hence, we will in notation not distinguish between and .
An element of significance to this paper is the image of which is a subring of denoted . We will also refer to
as an algebra, as it can be considered as a vector space overand a ring at the same time. One can compare the algebras of two staged trees sharing the same root. The inclusion here indicates that can be obtained by removing induced subtrees for vertices in .
For any two staged trees that share the same root one has .
The image of is generated by the products of labels of root-to-leaf paths in . Each of these paths is a root-to- path in , for some vertex . Hence the same product lies in the image of , as it is the image of . ∎
The central objective of this paper is to provide conditions under which the ideal is toric, possibly after a linear change of variables. Recall thus that a prime ideal in is called toric if it is generated by binomials, or equivalently is the kernel of a monomial map from to a Laurent ring . That is a monomial map means that each variable is mapped to a monomial. The kernel of such a map is generated by differences of monomials such that . The image of is the subalgebra of generated by the monomials . We say that an algebra is a monomial algebra if it is generated by monomials in a Laurent ring or a polynomial ring.
Even though the map Eq. 3 appears monomial, its target ring is a quotient ring instead of a Laurent ring, and we cannot conclude that defines a toric variety; recall here the coin flip example from the introduction. For staged tree models, the ideal is toric in the variables if and only if the tree is balanced.
[DG, Theorem 10] The ideal is a toric ideal if and only if is a balanced staged tree.
Hence, if a non-balanced tree has a toric structure, the structure must appear only after a linear change of coordinates. This can be done either by explicitly finding a linear change of variables which makes a binomial ideal, or to prove that is a monomial algebra as we state below.
Suppose the algebra is isomorphic to a monomial algebra. Then the ideal is toric after an appropriate linear change of coordinates.
Let be isomorphic to an algebra minimally generated by monomials in some polynomial ring via a map . Then we have a composition of maps
which is a surjective homomorphism. We claim that the linear span of equals the linear span of . Every monomial is in the image of , and if it would not be the image of a linear form this would contradict the fact that are minimal generators. Then every must be mapped to a linear combination of , since otherwise we would get a non-homogeneous binomial generator of . It follows that there are linearly independent linear forms in the union of preimages . The ideal is binomial after the linear change of variables given by . ∎
Let be the tree in Fig. 3, and let denote the set of monomials of degree four in the variables , . Then is the subalgebra of generated by , , , , , and . All of the generators can be expressed as linear combinations of the monomials in , using the relation . Conversely, one can verify that the five monomials in are the images of the linear forms , , , , under . For instance
It follows that is a different generating set for the algebra . As , the image of is isomorphic to the monomial algebra . To get the change of variables which makes toric we choose another linear form , not in the span of , which maps to a monomial of . For example we may take , as this maps to . The ideal is then the kernel of the monomial map .
A first application of creftype 2.5 is that if two distinct trees share the same algebra then they must both be toric or non-toric. The same is true if the two algebras change by a permutation or when the parameters of a staged tree are permutations of parameters of another other staged tree.
If for staged trees and the algebras and are equal or they change only by a permutation of variables, the staged tree models and are simultaneously toric or non-toric.
3 Balanced staged tree models
As stated in Theorem 2.4 the ideal is toric when the tree is balanced. The monomial map defining this toric ideal is precisely the map (4) with the quotient ring replaced by the polynomial ring . In this section we continue the study of balanced staged trees and their prime ideals.
3.1 The combinatorial structure of balanced trees
Our main result on the combinatorial structure of balanced trees is creftype 3.1, which states that a balanced staged tree model always can be represented by a tree with a certain colour structure. To obtain this colour structure we apply the homogenisation process already in the tree, as described in Remark 2.2. We also allow the out-degree one vertices to appear anywhere in the tree, not just as extensions in the end of a branch. See Fig. 4 for an example.
To any balanced staged tree we can apply the swap operation so that for every pair of vertices and in the same stage, one of the following holds.
For each index the two children and are in the same stage.
The children of are all in the same stage. The same holds for .
In order to prove Theorem 3.1 we need some preparations. We define the multiplicity of a colour at a vertex , denoted , to be the greatest number for which every -to-leaf path goes through at least vertices of the colour . If are the labels associated to the colour we can also say that is the greatest number for which every term in is divisible by a monomial in of degree . If we can use the swap operation to give the induced subtree a root of colour . Notice also that applying the swap operation to , or to an induced subtree inside , does not change the multiplicities.
Let and be vertices of the same stage in a balanced staged tree, and assume for some colour and some indices . Then .
The equality implies . If then we must have for the above equality to hold. ∎
Proof of Theorem 3.1.
In this proof vertices such that
will play an important role. If has the property (7) then we can give each the colour , using the swap operation on . If does not have the property (7), then either all its children are leaves, or we can use the swap operation on the ’s to give all children of the same colour. It follows from Lemma 3.2 that if then satisfies (7) if and only if does with the same colours.
We prove the theorem by providing an algorithm which will give the desired colour structure, assuming the tree is balanced. The algorithm goes through the internal vertices of out-degree greater than one whose children are not leaves. We visit the vertices in weakly increasing order, according to the distance from the root, and do the following.
If has a colour we have not seen before, we check if satisfies (7). If it does, we give each the colour , by using the swap operation. If not, we give all ’s the same colour.
If has a colour that we have seen before, we look at the children of the previous vertex of that colour. If the children of does not all have the same colour, it means the has the property (7). Then so does , and we can give each the same colour as . If all the children of have the same colour, it means that did not satisfy (7). Then neither does , and we give all children of the same colour.
Notice that these steps do not change the colour of any vertex we have already visited, or their children. Therefore, condition 1 or 2 in Theorem 3.1 will hold for all vertices and in the same stage of out-degree greater than one. For the vertices of out-degree one condition 2 always holds, so we are done. ∎
In squarefree staged trees, the root-to-leaf path can be read off uniquely from the monomial . In other words, the ideal will not contain any relations of degree one. Next, we will study the structure of balanced trees which are not squarefree. We use the notation for the grandchildren of , meaning that is the -th child of .
In a balanced tree, suppose we have a vertex which is in the same stage as all of its children. Then .
Since we have which can also be written as
As we also have . Applying this to the right hand side of (8) we get
and it follows that . ∎
Let and be vertices of the same stage in a balanced staged tree. Suppose there is a path that goes through both and , for some , where comes first. Then there is a path going through such that .
It is enough to consider the case when is the root, as we can restrict to the subtree . As a first step we apply the algorithm in the proof of Theorem 3.1. This does not change the colour of the root, but it might of course change the appearance of the rest of the tree. However, if we prove the statement for all in the new tree it will hold for all in the old tree as both trees represent the same model in the same parametrisation.
The proof is by induction over the depth of the tree. The smallest depth where we can have two vertices in the same stage in the same path is two. One can easily check that for such a tree to be balanced we need all internal vertices to be in the same stage. In this case we have and .
For the induction step we consider three cases.
The children of do not all have the same colour. In this case we have for all vertices that are in the same stage as . Then we use the resize operation on all the vertices in the same stage as . The new tree is also balanced by Lemma 2.1. As the depth has decreased, it follows by induction that the statement is true for this tree. We can easily redo the resize to find the path in the original tree.
The children of all have the same colour, but not the same colour as itself. This case is also illustrated in Fig. 5. Here we use the swap operation on and its children. This results in a new tree with a root followed by a number of induced subtrees whose roots are in the same colour as in the original tree. By induction there is a path in the same subtree as , with the desired properties. We can find this path in the original tree by swapping the root and its children one more time.
The children of have the same colour as . If we consider to be one on the children of , the result follows from Lemma 3.3. Otherwise we can swap and its children and continue as in case 2. ∎
3.2 The toric ideal associated to a balanced tree
We now turn to the problem of finding a generating set for the toric ideal , when is balanced. We start with a quick review of the concept of Gröbner bases for ideals in polynomials rings. For more details we refer the reader to [EH]. Every Gröbner basis relies on a monomial order . Here we will use the Degree Reverse Lexicographic monomial order (DegRevLex). Assume the variables are numbered from top to bottom in the tree, with in the top. We order the variables by . More generally, the monomials are ordered by in DegRevLex if or and there is an for which and for all . Every polynomial is a linear combination of monomials, and the greatest monomial according to the given order provides the leading term of . Now, let be an ideal in a polynomial ring, and let be a set of polynomials in . The set is a Gröbner basis for if the leading term of any polynomial in is divisible by the leading term of a polynomial in . A fundamental fact is that a Gröbner basis for an ideal is a generating set for .
It was conjectured by Duarte and Görgen that the toric ideal of a balanced staged tree have a Gröbner basis of binomials of degree two, [DG, Conjecture 17]. The conjecture was proved by Ananiadi and Duarte [AD] in the case of stratified staged trees with all leaves on the same distance from the root. We shall now prove the conjecture in full generality, with the modification that degree one generators are needed in the non-squarefree case. An example of a balanced staged tree which is not squarefree, and hence not stratified, can be found in Figure 5.
Let be a balanced staged tree. Suppose and are vertices in the same stage in and let be the labels of this stage. Let be the product of the labels along the path from the root to , and similarly for . If we multiply ( ‣ 2.1) by we get
Notice that the terms in each of the factors correspond to a root-to-leaf path. As every term has coefficient 1, there is no cancellation. Hence the above equality gives rise to a number of binomials in , where the path goes through , the path through , the path through , and through . Let be the set of all such binomials, together with all binomials of degree one in . The set for the tree in Fig. 4(a) is given in Example 3.7. We shall see that is a Gröbner basis for .
For a balanced staged tree the set of binomials of degree one and two defined above is a Gröbner basis for the ideal w. r. t. DegRevLex.
We know that is generated by homogeneous binomials. It follows from Buchberger’s algorithm [EH] that every binomial ideal has a Gröbner basis of binomials. Hence we are done if we can prove that the leading term of a binomial in is divisible by the leading term of a binomial of in . Let where is the leading term. We assume that and . We may also assume that is not divisible by a monomial. This implies that .
Let be the vertex where the two paths and split. Say then follows an edge labelled , and an edge labelled . As we have . Let be the product of the labels along the path from the root to . Then
and since the monomial is divisible by , one of the factors in the left hand side must be divisible by as well. Notice that none of goes through , as and takes the -th edge at . Then we have the following cases.
Some goes through and an edge labelled , and the -edge appears after . Then by Proposition 3.4 there is a that takes the -th edge at and . As we have , and is the leading term of .
Some , with , goes through and an edge labelled , and the -edge appears before .
The -edge is an outgoing edge of some vertex in the same stage as . As and lie on the path leading to , the path must also go through . At the path takes the -th edge for some . Then we have such that takes the -th edge at , and takes the -th edge at . Then and , which makes the leading term.
Some , with , goes through an edge labelled and does not go through . So there is a vertex in the same stage as , and they are on different branches. Since the vertex must sit above in the tree. We have such that takes the -th edge at and takes the -th edge at . Then and , which makes the leading term.
In all three cases we have found an element in with a leading term which divides the leading term of . ∎
Sometimes it can be useful to identify variables that are mapped to the same monomial under , and in this way get rid of the degree one relations. For this purpose, let be the polynomial ring on the subset of the variables obtained by removing if there is a such that . Let be the map restricted to . The two maps have identical images.
For a balanced staged tree, the degree two binomials of are a Gröbner basis for w. r. t. DegRevLex.
Let , where is the leading term. As we also have the leading term of is divisible by the leading term of a binomial of degree one or two, by Theorem 3.5. However, the leading terms of the degree one binomials in are precisely the variables we removed in the construction of . Hence must be of degree two. The leading term of is monomial in (one or two of) the variables . If the non-leading term of is divisible by some for which there is a such that , we substitute by in . This produces a binomial and does not change the leading term. Hence is divisible by the leading term of the degree two binomial . ∎
3.3 The monomial algebra associated to a balanced tree
For a balanced tree let denote the subalgebra of generated by the monomials corresponding to the root-to-leaf paths. As this set of monomials is exactly the parametrisation for the toric ideal we have . In this section we will discuss some properties of that are central in commutative algebra. In particular we shall see that is Koszul, normal, and Cohen-Macaulay. We will briefly recall the definitions of these properties. For a more extensive review we refer the reader to [EH].
A -algebra is Koszul if the field has a linear free resolution over . We recall two fundamental results about Koszul algebras. Suppose . If is Koszul then is generated in degree two. Moreover, if has a Gröbner basis consisting of polynomials of degree two then is Koszul [Anick]. These give us an immediate corollary of Theorem 3.6.
For a balanced staged tree , the associated monomial algebra is Koszul.
An algebra generated by monomials, such as , can be considered a semigroup ring. The semigroup is the set of monomials in the ring, with multiplication as the semigroup operation. A semigroup is called normal if it is finitely generated and has the following property. If there are such that for some positive integer then there is a such that . The ring is a Noetherian domain, meaning that it is a normal ring if it is integrally closed in its field of fractions. Moreover, recall that a ring is Cohen-Macaulay if , where in general . We summarise two important results on normal semigroup rings.
[Hochster, Proposition 1, Theorem 1] Let be a semigroup of monomials, and let denote the semigroup ring over a field . Then is normal if and only if is normal. Moreover, if is normal, then it is Cohen-Macaulay.
Let be a balanced staged tree, and let be the semigroup generated by the monomials , , considered as monomials in . We shall prove that is normal, and hence is normal.
So, suppose we have a monomial and two monomials and such that for some positive integer . Assume we have indexed so that the path agrees with in as many steps as possible. Suppose they separate in a vertex , where takes the edge labelled and the edge labelled . Then there must be a matching in . Here we can follow the same steps as in the proof of Theorem 3.5 and get that there is and such that