# The computation of first order moments on junction trees

We review some existing methods for the computation of first order moments on junction trees using Shafer-Shenoy algorithm. First, we consider the problem of first order moments computation as vertices problem in junction trees. In this way, the problem is solved using the memory space of an order of the junction tree edge-set cardinality. After that, we consider two algorithms, Lauritzen-Nilsson algorithm, and Mauá et al. algorithm, which computes the first order moments as the normalization problem in junction tree, using the memory space of an order of the junction tree leaf-set cardinality.

## Authors

• 1 publication
• 2 publications
• 2 publications
12/26/2020

### General formulas for the central and non-central moments of the multinomial distribution

We present the first general formulas for the central and non-central mo...
04/14/2021

### Cost-constrained Minimal Steiner Tree Enumeration by Binary Decision Diagram

The Steiner tree enumeration problem is a well known problem that asks f...
12/27/2019

### Transversals of Longest Cycles in Partial k-Trees and Chordal Graphs

Let lct(G) be the minimum cardinality of a set of vertices that intersec...
08/31/2017

### Sketching the order of events

We introduce features for massive data streams. These stream features ca...
02/14/2019

### Sequential importance sampling for multi-resolution Kingman-Tajima coalescent counting

Statistical inference of evolutionary parameters from molecular sequence...
12/15/2020

### Fast 3D Image Moments

An algorithm to efficiently compute the moments of volumetric images is ...
07/11/2018

### Geometric comparison of phylogenetic trees with different leaf sets

The metric space of phylogenetic trees defined by Billera, Holmes, and V...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Shafer-Shenoy algorithm

### 1.1 Potentials and operations

Let

be a finite collection of discrete random variables. Let

denotes the set of possible values that can take. For we write for the Cartesian product and write for 111We implicitly assume some natural ordering in sets. The elements of , are denoted and called the configuration. We adopt the convention that consists of a single element , i.e. .

Let be a set of functions on , where , i.e. . In the following text, the functions from are called potentials. Let be a binary operation on called combination and let denotes the external operation called marginalization which to every associates , where .

We assume that the following Shafer-Shenoy axioms hold for combination and marginalization.

### 1.2 Shafer-Shenoy axioms

Axiom 1 (Commutativity and Associativity) Let and be potentials. Then

 πA⊗πB=πB⊗πAandπA⊗(πB⊗πC)=(πA⊗πB)⊗πC. (1)

Axiom 1 allows us to use the notation .

Axiom 2 (Consonance) Let be a potential on A, and let . Then

 (π↓BA)↓C=π↓CA. (2)

Axiom 3 (Distributivity) Let and be potentials on and , respectively. Then

 (πA⊗πB)↓A=πA⊗π↓AB. (3)

### 1.3 Junction Tree

The joint potential is said factorize on with respect to if there exists potentials for , so that we can write as

 πU=⨂V∈VπV. (4)

In this paper we consider the joint potentials which can be represented with junction tree which is defined as follows.

###### Definition 1

Let be a collection of subsets of (set of variable indices) and a tree with as its node set (corresponds to a set of local domains). Then is said to be a junction tree if any intersection of a pair of sets in is contained in every node on the unique path in between and . Equivalently, for any , the set of subsets in containing induces a connected subtree of .

The set of all neighbors of is denoted . We omit the parentheses from the notation when it is not prone to misunderstanding. Hence, stands for the set of all neighbors of without , for the set of all neighbors of without and and so on. denotes that and are neighbors.

A junction tree is usually drawn with sets as node labels. In the following text the node will be identified with the label. The general procedure for the junction tree building can be found in [1] and [2]. An example of the junction tree which corresponds to chain factorization,

 πU=n⨂i=1πVi,Vi∼Vi+1 for i=1,…,n−1, (5)

is given in Fig. 1.

### 1.4 Problems

The junction tree enables the solution of three important problems:

1. The single vertex problem at node is defined as the computation of the potential , defined by

 ψA=π↓AU=(⨂V∈VπV(xV))↓A, (6)
2. The all vertices problem is defined as the computation of the functions for all ;

3. The normalization problem is the marginalization of the joint potential (4) to the empty set . Using the consonance of the marginalization (2), it can straightforwardly be solved by the solution of the single vertex problem in arbitrary node :

 π↓∅U=(π↓AU)↓∅=ψ↓∅A (7)

### 1.5 Local computation algorithm

These problems can efficiently be solved with the Shafer-Shenoy local computation algorithm (LCA). The algorithm can be described as passing the messages over the edges and processing the messages in the nodes of the junction tree.

Messages are passed between the vertexes from via mailboxes. All mailboxes are initialized as empty. When a message has been placed in a mailbox, the box is full. A node in the junction tree is allowed to send a message to its neighbor if it has not done so before and if all -incoming mailboxes are full except possibly the one which is for -outgoing messages. So, initially only leaves (nodes which have only one neighbor) of the junction tree are allowed to send messages. But as the message passing proceeds, other nodes will have their turn and eventually all mailboxes will be full, i.e., exactly two messages will have been passed along each branch of the junction tree.

The message form to is a function . The passage of a message from node to node is performed by absorption. Absorption from clique to clique involves eliminating the variables from the potentials associated with and its neighbors except . The structure of the message is given by

 πA→B=(πA⊗(⨂C∈ne(A)∖BπC→A))↓B. (8)

where is the message passed from to . Since the leaves has only one neighbor, the product on the righthand side is empty and the message can initially be computed as

 πA→B=π↓BA. (9)

Suppose we start with a joint potential on a junction tree , and pass messages towards a root clique as described above. When has received a message from each of its neighbors, the combination of all messages with its own potential is equal to a decomposition of the -marginal of .

 π↓RU=(⨂V∈VπV)↓R=πR⊗(⨂V∈ne(R)πV→R). (10)

where is vertex-set in .

Hence, if we want to solve the single vertex problem at node , we need to compute all messages incoming to , while for the all vertices problem we need the messages between all pairs of nodes in the tree.

For the single vertex problem, the algorithm starts at the leaves which send the messages to their neighbors. A node sends a message to a neighbor, once it has received messages from each of its other neighbors. The node never sends a message. Thus, each message is sent only once until has received the messages from all the neighbors at which point the required marginal is computed and the algorithm terminates with the total number of computed messages equal to the number of edges of the tree. Once we have solved the single vertex problem in the node , the normalization problem can be solved with (7).

The first part of the algorithm for all vertices problem is similar to the single vertex case. The messages are sent from leaves toward the tree until a node has received the messages from all the neighbors. After that the messages are sent from to the leaves. The algorithm stops when all leaves receive messages. The total number of computed messages is equal two times the number of edges in the tree (for any two nodes and we send the messages and ).

## 2 First order moments

### 2.1 Operations on the set of functions

For real-valued functions and the sum, and the product, , are respectively defined with:

 (ϕA+ϕB)(xA∪B) =ϕA(xA)+ϕB(xB) (11) (ϕA⋅ϕB)(xA∪B) =ϕA(xA)⋅ϕB(xB) (12)

for all and . The product is, usually, shortly denoted with .

We define sum-marginal operator for , which to every real-valued function associates the function , defined with

 (∑xC∖AϕC)(xA)=∑xC∖A∈ΩxC∖A ϕC(xC) (13)

and the marginalization is defined with

 ϕ↓AC=∑xC∖A ϕC. (14)

### 2.2 Definition of first order moments

The joint probability non-negative function of random variable

, is said to factorize multiplicatively on if there exists non-negative real functions for , so that we can write as

 pU=∏C∈VpC, (15)

Similarly, the function is said to factorize additively on if there exists real functions for , so that we can write as

 hU=∑C∈VhC (16)

The first order moment potential, , is defined with

 mC=∑xU∖CpU⋅hU. (17)

In the case , the first order moment potential is simply denoted ,

 m=∑xUpU⋅hU. (18)

and called the first order moment.

###### Example 1

The first order moment potential may be useful for expressing the conditional expectation

 E[hU(XU)|xC]=∑xU∖Cp(XU|xC)h(XU∖C,xC) (19)

for . After usage of

 p(XU∖C|xC)=p(XU∖C,xC)p(xC)=p(XU∖C,xC)∑xU∖Cp(XU∖C,xC) (20)

we have

 E[hU(XU)|xC]=∑xU∖Cp(XU∖C,xC)h(XU∖C,xC)∑xU∖Cp(XU∖C,xC)=mC(xC)p↓CU(xC). (21)

Consequently, unconditioned expectation equals the first order moment

 E[h(XU)]=∑xUp(xU)h(xU)=m. (22)

### 2.3 The problem of first order moments computation as all vertices problem

The computation of (18) by enumerating all configurations would require an exponential number of operations with respect to the cardinality of . However, the computational complexity can be reduced using the local computation algorithm which exploits structure of functions given with factorizations (15) and (16). In this case, the marginal values are computed for all using the local computation over the set of real-valued functions. After that the moment is computed according to equality

 mC=∑C∈V∑xChC p↓CU, (23)

which follows from

 mC=∑xUpU hU=∑xUpU∑C∈VhC=∑C∈V∑xUpU hC=∑C∈V∑xChC∑xU∖CpU=∑C∈V∑xChC p↓CU. (24)

This method requires the storing of marginal values for all , which unnecessary increases the memory complexity. Instead, we can use the local computation algorithms by Lauritzen and Nilsson [3] and Mauá et al. [4], which find the moment as the solution of the normalization problem. In the following section, we consider these two algorithms.

## 3 First order moments computation using order pair potential algorithms

### 3.1 Order pair potentials

In our local computation algorithms we represent the quantitative elements through entities called potentials. Each such potential has two parts, as detailed below.

###### Definition 2

(Potential) A potential on is a pair =, of real-valued functions on , where is nonnegative.

Thus, a potential consists of two parts - -part and -part. We call a potential vacuous, if . We identify two potentials and on and write if and whenever

 p(1)C(xC)=p(2)C(xC)>0, (25)

i.e., two potentials are considered equal if they have identical probability parts and their utility parts agree almost surely with respect to the probability parts.

To represent and evaluate the decision problem in terms of potentials, we define basic operations of combination and marginalization. There are two possible ways to define the operations.

1. Lauritzen-Nilsson algorithm [3]

2. Mauá et al. algorithm [4]

### 3.2 Lauritzen-Nilsson algorithm

###### Definition 3

(Combination) The combination of two potentials and denotes the potential on given by

 πA⊗πB=(pA⋅pB,hA+hB). (26)
###### Definition 4

(Marginalization) The marginalization of onto is defined by

 π↓AC=(∑xC∖ApC,∑xC∖ApC⋅hC∑xC∖ApC) (27)

Here we have used the convention that 0/0 which will be used throughout.

As shown in Lauritzen and Nilsson [3], the operations of combination and marginalization satisfy the properties of Shenoy and Shafer axioms [5], and three structured factorizations can be marginalized using the Shafer-Shenoy algorithm.

If the operations are defined in this way and the potentials are set to,

 ϕC=(pC , hC) (28)

and the factorizations (15) and (16) hold, then

 πU=⨂C∈VπC=⨂C∈V(pC,hC)=(∏C∈VpC,∑C∈VhC)=(pU,hU). (29)

Accordingly, we have

 π↓∅U=(∑xUpU,∑xUpU hU∑xUpU)=(1,m), (30)

where we have used probability condition . Hence, the first order moment potential can be computed using the Shafer-Shenoy local computation algorithm, where the combination and marginalization are defined with (26)-(27). The messages have the form:

 πA→B=(π(p)A→B,π(h)A→B) (31)

where the and part are given with

 π(p)A→B=∑xA∖B pA∏C∈ne(A)∖Bπ(p)C→A (32)
 π(h)A→B=∑xA∖B pA∏C∈ne(A)∖Bπ(p)C→A⋅(hA+∑C∈ne(A)∖Bπ(h)C→A)∑xA∖B pA ∏C∈ne(A)∖Bπ(p)C→A (33)

which follows from equations (8), (26), (27), (28) and (43) .

###### Example 2

Let has the chain factorization

 πU=n⨂i=1πVi,Vi∼Vi+1 for i=1,…,n−1, (34)

and let stands as shorthand for the message . According to chain factorization , and parts of the message reduce to:

 π(p)i→(i+1)=∑xVi∖Vi+1 pVi π(p)(i−1)→i (35)
 π(h)i→(i+1)=∑xVi∖Vi+1  pVi π(p)(i−1)→i⋅( hVi+π(h)(i−1)→i )∑xVi∖Vi+1 pVi π(p)(i−1)→i (36)

### 3.3 Mauá et al. algorithm

###### Definition 5

(Combination) Let and be two potentials on and , respectively. The combination of and is the potential on given by

 πA⊗πB=(pApB , hApB+pAhB). (37)
###### Definition 6

(Marginalization) Let be a potential on , and let . The marginalization of onto is the potential on given by

 π↓AC=(∑xC∖ApC,∑xC∖AhC). (38)

The following lemma can be proven by induction [6].

###### Lemma 1

Let and

be order pair potentials for

. Then,

 (39)

If the operations are defined in this way and the potentials are set to,

 πC=(pC , pChC) (40)

and the factorizations (15) and (16) hold, then

 πU=⨂C∈VπC=(∏A∈VpA,∏A∈VpA∑B∈VhB)=(pU,pUhU). (41)

Accordingly, we have

 π↓∅U=(∑xUpU,∑xUpU hU)=(1,m), (42)

where we have used probability condition . Again, the first order moment potential can be computed using the Shafer-Shenoy local computation algorithm, where the combination and marginalization are defined with (37) and (38). Like in the Lauritzen-Nilsson algorithm, the messages have the form:

 πA→B=(π(p)A→B,π(h)A→B) (43)

but now, according to 39, the -part and the -part of the messages are given with

 π(p)A→B=∑xA∖B pA ∏C∈ne(A)∖Bπ(p)C→A. (44)
 π(h)A→B=∑xA∖BpA⋅(∏C∈ne(A)∖Bπ(p)C→A⋅hA  +∑C∈ne(A)∖Bπ(h)C→A∏D∈ne(A)∖B,Cπ(p)D→A). (45)

Note that the -parts of the Lauritzen-Nilsson algorithm and the Mauá et al. algorithm are the same. For the trees with large average degree, the -parts of messages are more complex in Mauá et al. algorithm, due to repeated multiplications in products in the equality (45). However, Mauá et al. algorithm is simpler for chains as the following example shows.

###### Example 3

Let has the chain factorization

 πU=n⨂i=1πVi,Vi∼Vi+1 for i=1,…,n−1, (46)

and let stands as shorthand for the message . According to chain factorization , and parts of the message reduce to:

 π(p)i→(i+1)=∑xVi∖Vi+1 pVi π(p)(i−1)→i, (47)
 π(h)i→(i+1)=∑xVi∖Vi+1pVi⋅(π(p)(i−1)→ihVi+π(h)(i−1)→i). (48)

## 4 Conclusion

We reviewed some existing methods for the computation of first order moments on junction trees using Shafer-Shenoy algorithm. First, we consider the problem of first order moments computation as vertices problem in junction trees. In this way, the problem is solved using the memory space of an order of the junction tree edge-set cardinality. After that, we considered two algorithms, Lauritzen-Nilsson algorithm, and Mauá et al. algorithm, which computes the first order moments as the normalization problem in junction tree, using the memory space of an order of the junction tree leaf-set cardinality. It is shown, that for trees, the first of them has simpler formulas in comparison to the second one, while the second one is simpler for chains.

## References

• [1] S M Aji and R J McEliece. The generalized distributive law. IEEE Transactions on Information Theory, 46(2):325–343, 2000.
• [2] Robert G. Cowell, A. Philip Dawid, Steffen L. Lauritzen, and David J. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer, 1999.
• [3] Steffen L. Lauritzen and Dennis Nilsson. Representing and Solving Decision Problems with Limited Information. Manage. Sci., 47(9):1235–1251, 2001.
• [4] Denis Deratani Mauá, Cassio Polpo de Campos, and Marco Zaffalon. Solving limited memory influence diagrams. CoRR, abs/1109.1754, 2011.
• [5] Prakash P. Shenoy and Glenn Shafer. Axioms for probability and belief-function propagation. In

Uncertainty in Artificial Intelligence

, pages 169–198. Morgan Kaufmann, 1990.
• [6] V.M. Ilić and, M.S. Stanković and, and B.T. Todorović and. Entropy message passing. Information Theory, IEEE Transactions on, 57(1):375 –380, jan. 2011.