Sum-Product Network Decompilation

12/20/2019 ∙ by Cory J. Butz, et al. ∙ University of Cambridge University of Regina 0

There exists a dichotomy between classical probabilistic graphical models, such as Bayesian networks (BNs), and modern tractable models, such as sum-product networks (SPNs). The former have generally intractable inference, but allow a high level of interpretability, while the latter admits a wide range of tractable inference routines, but are typically harder to interpret. Due to this dichotomy, tools to convert between BNs and SPNs are desirable. While one direction – compiling BNs into SPNs – is well discussed in Darwiche's seminal work on arithmetic circuit compilation, the converse direction – decompiling SPNs into BNs – has received surprisingly little attention. In this paper, we fill this gap by proposing SPN2BN, an algorithm that decompiles an SPN into a BN. SPN2BN has several salient features when compared to the only other two works decompiling SPNs. Most significantly, the BNs returned by SPN2BN are minimal independence-maps. Secondly, SPN2BN is more parsimonious with respect to the introduction of latent variables. Thirdly, the output BN produced by SPN2BN can be precisely characterized with respect to the compiled BN. More specifically, a certain set of directed edges will be added to the input BN, giving what we will call the moral-closure. It immediately follows that there is a set of BNs related to the input BN that will also return the same moral closure. Lastly, it is established that our compilation-decompilation process is idempotent. We confirm our results with systematic experiments on a number of synthetic BNs.



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

There exists a trade-off between classical probabilistic graphical models and recent tractable probabilistic models. Classical models, such as Bayesian networks (BNs) [pear88], provide high-level interpretability, as conditional independence assumptions are directly reflected in the underlying graphical structure. However, the downside is that performing exact inference in BNs is NP-hard [coop90]. In contrast, modern tractable probabilistic models, like sum-product networks [darwiche09, poon2011sum], allow a wide range of tractable inference, but are harder to interpret. In order to combine advantages of both BNs and SPNs – which are complementary regarding interpretability and inference efficiency – tools to convert back and forth between these types of models are vital.

The direction compiling BNs into SPNs is well understood due to Darwiche’s seminal work on arithmetic circuits (ACs) compilation [darwiche09].111ACs and SPNs are equivalent models. Deterministic models are typically referred to as ACs, while non-determinstic models are called SPNs. See Section 2 for further details. Since inference in ACs/SPNs can be performed in linear time of the network size, BN compilation amounts to finding an inference machine with minimal inference cost. ACs can also take advantage of context-specific-independence [bout96] in the BN parameters to further reduce the size of the AC.

The converse direction of SPN decompilation into BNs has received limited attention. This lack of attention can be understood historically: since an original purpose of ACs was to serve as efficient inference machine for a known BN, decompilation would seem like a mere academic exercise. The proposition of SPNs, however, introduced some practical changes to the AC model. First, unlike ACs, SPNs are typically learned directly from data, i.e., a reference BN is not available. Thus, providing a corresponding BN would greatly improve the interpretability of the learned SPN. Second, as already mentioned, SPNs are typically non-deterministic, which naturally introduces an interpretation of SPNs as hierarchical latent variable models [Peharz:2016wl, choi2017relaxing]. A decompilation algorithm for SPNs should account for this fact, and generate BNs with a plausible set of latent variables. Note that BNs which do not adequately account for latent variables tend to be densely connected, see e.g. [elidan2001discovering]. Thus, naively decompiling SPNs with a decompilation algorithm devised for ACs would yield densely connected and rather uninterpretable BNs.

In this paper, we address this gap by formalizing SPN decompilation. We propose SPN2BN, an algorithm that converts a trained SPN into a BN. Our algorithm arguably improves over the only two other approaches in the literature addressing the connection between BNs and SPNs [zhao2015relationship] and [Peharz:2016wl]. First, while both of these approaches produce a BN for a given SPN, these BNs are, in general, not minimal independence maps (I-maps) [darwiche09], i.e., they introduce unnecessary dependency assumptions. Our algorithm SPN2BN, on the other hand, produces minimal I-maps. Second, both [zhao2015relationship, Peharz:2016wl] are excessive with the number of introduced latent variables. In fact, both approaches interpret each single sum node in an SPN as a latent variable on its own. In this paper, we devise a more economical approach and identify groups of sum nodes to jointly represent one latent variable. This grouping is based on whether sum nodes are “on the same level of circuit hierarchy” and “responsible” for the same set of observable variables (these notions will be made formal in Section 3).

These design choices for SPN2BN improve over [zhao2015relationship] and [Peharz:2016wl] both in terms of a reduced number of BN nodes (latent variables) and a reduced number of edges (minimal I-mapness). While this design leads to more succinct and perhaps more esthetic BNs, SPN2BN is also justified in a formal way. We show that SPN2BN can be seen as the inverse of the compilation process proposed in [darwiche09]. Consider a BN that was compiled into an AC with variable elimination following a reverse topological order (VErto). Convert the AC into an SPN with an optional marginalization operation222SPNs are closed under marginalization, that is, any sub-marginal of any SPN can again be represented as an SPN. [darwiche09]

creating latent variables by rendering random variables unobserved. Then, decompiling SPN

with SPN2BN yields a BN with a set of directed edges that are a superset of the original BN, and that which we call the moral closure of . The SPN2BN algorithm is consistent with respect to a compilation algorithm by always yielding moral closures of any given BN . Consistency is arguably a desirable property for any decompilation method, since it allows us to exactly characterize the output BN with respect to the input BN. This, in turn, allows us to establish idempotence between compiler and decompiler. In other words, the result of applying the compilation-decompilation process once is the same result as applying the process twice. In contrast, [zhao2015relationship] and [Peharz:2016wl] are not consistent with any general-purpose compilation algorithm, and tend to increase the number of variables and edges in the constructed BN. Lastly, even when the input SPN does not stem from an assumed compiler, e.g., when it is learned from data, the VErto compilation assumption within SPN2BN helps us to interpret the result of decompilation.

For example, consider a prominent example of a BN in Figure 1

(subfig:bn_hmm), commonly known as hidden Markov model (HMM). In Figure

1 (subfig:spn_hmm), we see the result of VErto, followed by marginalization of , , and , deeming these three variables latent. This SPN shall be converted back into a BN. In Figure 2 (subfig:zhao_hmm, subfig:peharz_hmm) we see the BNs produced by [zhao2015relationship] and [Peharz:2016wl], respectively. Both BNs introduce more variables than were present originally, and the introduced edges hardly reflect the succinct independence assumptions of HMM. In Figure 2 (subfig:us_hmm), the decompilation result by our SPN2BN algorithm is depicted. I can be seen that SPN2BN recovers the original HMM structure, where “new” latent variables , , and have been introduced, which exactly correspond to the original latent variables , and , respectively. Evidently, no decompilation is able to recover the original labels for these variables, since reference to these has been explicitly removed by the previous (optional) marginalization operation. However, we see that SPN2BN successfully detects their signature in the compiled SPN, enabling it to recover an equivalent set of latent variables.

Our theoretical results are empirically confirmed on a systematic range of compiled BNs, namely, all possible connected BNs containing up to 7 variables using all possible VErtos, followed by marginalization of all internal variables. Our algorithm SPN2BN recovers the moral closure of the original BN in every case.

Figure 1: Compilation of the BN in (subfig:bn_hmm) using VErto [darwiche09], and marginalizing , , and , yields the SPN in (subfig:spn_hmm).
Figure 2: Decompilation of the SPN in Figure 1 (subfig:spn_hmm) by [zhao2015relationship] in (subfig:zhao_hmm), [Peharz:2016wl] in (subfig:peharz_hmm), and SPN2BN in (subfig:us_hmm)

2 Sum-Product Networks

Here, we review BNs, ACs, and SPNs, as well as the compilation of BNs into SPNs.

We denote random variables (RVs) by uppercase letters, such as and , possibly with subscripts, and their values by corresponding lowercase letters and . Sets of RVs are denoted by boldfaced uppercase letters and their combined values by corresponding boldfaced lowercase letters. The children of a variable in a directed acyclic graph (DAG) , denoted , are the immediate descendants of in . Similarly, the parents of a variable are immediate ancestors of . The descendants are the variables with a directed path from to in . A variable is called a v-structure in a DAG , if directed edges and appear in , where and are non-adjacent variables in .

Before defining a BN, we review two key concepts: d-separation and I-maps. The independency information encoded in the DAG can be read graphically by the d-separation algorithm in linear time [geigerVermaPearl89].

Definition 1.

[pear88] If , , and are three disjoint subsets of nodes in a DAG , then is said to d-separate from , denoted , if along every path between a node in and a node in there is a node satisfying one of the following two conditions: (i) has converging arrows and none of or its descendants are in , or (ii) does not have converging arrows and is in .

The next definition formalizes when a DAG is an I-map of a joint probability distribution (JPD).

Definition 2.

[darwiche09] Let be a DAG and be a JPD over the same set of variables. is an I-map of if and only if every conditional independence read by d-separation on holds in the distribution . An I-map is minimal, if ceases to be an I-map when we delete any edge from .

BNs are DAGs with nodes representing variables and edges representing variable dependencies, in which the strength of these relationships are quantified by conditional probability tables (CPTs).

Definition 3.

[pear88] Given a JPD on a set of variables , a DAG is called a Bayesian network (BN) of if is a minimal I-map of .

A BN over variables has its CPTs defined over each variable given its parents, that is , for every . One salient feature is that the product of the BN CPTs yields a JPD over the set of all variables. In a BN, the independencies read by d-separation in the DAG are guaranteed to hold in the JPD . Unfortunately, while BNs have clear interpretability, exact inference in BNs is NP-hard [coop90].

BNs can be compiled into Arithmetic Circuits (ACs) [darwiche09] by graphically mapping the operations performed when marginalizing all variables from the BN.

Definition 4.

[darwiche09] An arithmetic circuit (AC) over variables is a rooted, DAG whose leaf nodes are labeled with numeric constants, called parameters, or variables, called indicators, and whose other nodes are labeled with multiplication and addition operations.

Notice that parameter variables are set according to the BN CPTs, while indicator variables are set according to any observed evidence.

SPNs are a probabilistic graphical model that can be learned from data using, for instance, the LearnSPN algorithm [gens2013learning].

Definition 5.

A sum-product network (SPN) is a DAG containing three types of nodes: leaf distributions, sums, and products. Leaves are tractable distribution functions over . Sum nodes compute weighted sums , where are the children of and are weights that are assumed to be non-negative and normalized [peharz2015theoretical]. Product nodes compute . The value of an SPN, denoted , is the value of its root.

The scope of a sum or product node is recursively defined as , while the scope of a leaf distribution is the set of variables over which the distribution is defined. A valid SPN defines a JPD and allows for efficient inference [poon2011sum]. The following two structural constraints on the DAG guarantee validity. An SPN is complete if, for every sum node, its children have the same scope. An SPN is decomposable if, for every product node, the scopes of its children are pairwise disjoint. Valid SPNs are of particular interest because they represent a JPD over the variables in the problem domain. In addition, like ACs, exact inference is linear in the size of the DAG. Unlike ACs, however, SPNs allow for a latent variable (LV) interpretation [Peharz:2016wl].

In [poon2011sum], it was suggested that SPN sum nodes can be interpreted as LVs by simply connecting indicator variables for each sum node’s child. However, this naive augmentation of the model renders the SPN incomplete. [Peharz:2016wl] remediate this problem by augmenting an SPN with twin sum nodes. We overview next the process of augmenting an SPN as suggested in [Peharz:2016wl]. First, explicitly represent each sum node as LV by connecting indicator variables , for the th child of , using a product node . That is, directed edges , , and are added to . Second, add twin sum nodes to fix the completeness problem as follows. Given a sum node in with children and weights , twin sum node of has the same number of weights and indicator variables as children. Consider sum node in , [Peharz:2016wl] calls a conditioning sum node if there exists a child with . Add a twin sum node , for each sum node in with at least one conditioning sum node . Then, connect every of to using the previously added product nodes . (See AugmentSPN in [Peharz:2016wl] for complete details.)

3 SPN Decompilation

In this section, we formalize SPN decompilation into a BN.

The interpretation of SPN latent space, at a first glance, is akin to reading tea leaves. For instance, every SPN sum node can be viewed itself as a LV, as done in [zhao2015relationship, peharz2015theoretical]. In stark contrast, all SPN sum nodes can be interpreted as one single LV [Peharz2015-thesis]. This provides a wide spectrum of interpretations based only on those sum nodes appearing in an SPN. In addition, external LVs can be introduced to an SPN, such as the switching parents in [peharz2015theoretical]. Thus, for a given SPN learned from data, there are seemingly countless possible interpretations of its latent space.

However, if we assume that a given SPN was compiled using a well-known procedure, we can make more useful interpretations of the latent space. A compilation assumption is introduced to guide the interpretation of SPN latent space. Although there are different methods for compiling an SPN, we suggest the use of a compilation assumption that matches the common understanding of an SPN as a hierarchical mixture model [peharz2015theoretical]. More specifically, the compilation assumption used in this paper is that a given SPN was compiled by Algorithm 1 from a BN, where all BN variables were eliminated using the inference algorithm Variable Elimination (VE) [zhan94]

following any reverse topological order (VErto). The intuition behind this assumption is that the recursive marginalization of variables during VE are responsible for forming the hierarchical layers of sum nodes in an SPN. This sum layer hierarchy is consistent with the natural interpretation of an SPN as a hierarchical Gaussian mixture model, as suggested in


1:procedure BN2SPN()
2:      VE with reverse topological order (VErto)
3:     Let be a reverse topological ordering of
4:      = compile-to-AC-with-VE(,)
5:      Convert AC to SPN
6:      = redistribute-parameters()
7:      = compile-marginalized-spn()
9:     while  do
10:         Let be a copy of
11:          = add-terminal-nodes()
12:          = remove-products-of-products()
13:          Lump products over the same children
14:          = lump-products()
15:         if  then
17:     return
Algorithm 1 SPN Compilation Assumption

In Algorithm 1, start by converting a given BN into an AC in line 4 using VErto , as described in [darwiche09]. Then, in line 6, we remove the leaf parameters from by equivalently redistributing their values as sum weights, yielding an SPN . In line 7, we assume that all internal latent variables in are marginalized and, thus, all of their indicator variables are set to 1. Here, any arbitrary subset of the internal latent variables can be considered instead. Next, we recursively simplify by applying three operations until no further change can be made. In line 11, sum nodes with only indicator nodes as children are converted into a terminal node [zhao2015relationship], which are a univariate distribution over the indicator variable. Product nodes with only product node children are simplified into a single product node in line 12. Finally, in line 14, if two or more product nodes have the same set of children, then they are lumped into a single product node. These three operations are then repeated until no further change can be made on .

On the other hand, by decompilation, we mean the procedure of converting an SPN into a BN. This process involves determining the RVs and a DAG for a BN. We can suggest RVs for the BN by analyzing the compilation assumption. Similarly, a BN DAG can be obtained as an I-map from the SPN DAG. We now formalize these ideas.

Definition 6.

Given an SPN over RVs and a compilation assumption, SPN decompilation is an algorithm that both:

  1. suggests a set of LVs , and

  2. produces an I-map over and .

Task (i) of SPN decompilation is more involved than expected. A naive approach is to disregard the compilation assumption and treat each sum node as one LV. Negative consequences of this approach will be discussed in the next section. We suggest a more elegant approach by interpreting the effect of the compilation assumption on graphical characteristics of the SPN.

Recall that we assume the SPN was compiled using VErto. During compilation of the SPN, marginalizing variables creates groups of sum nodes in the same layer (the distance of the longest path from the root). Hence, identifying these groups is a way of suggesting RVs for the decompiled BN.

More formally, given a sum node , the sum depth of is the number of sum nodes in the longest directed path from the root to .

Example 1.

The sum-depth of sum node in the SPN of Figure 1 (subfig:spn_hmm) is 2, since there are 2 sum nodes on the longest path from the root to . Similarly, the sum-depth of is 1 and of is 0.

A sum-layer is the set of all sum nodes having the same sum-depth.

Example 2.

One sum-layer in the SPN of Figure 1 (subfig:spn_hmm) consists of and , since both and have a sum-depth of 2. Furthermore, and form another sum-layer, as does by itself.

A sum-region is the set of all sum-nodes within the same sum-layer and having the same scope.

Example 3.

Sum-layer and in the SPN of Figure 1 (subfig:spn_hmm) has only one sum-region, since and have the same scope. For the same reason, sum-layer and also has only one sum-region.

A sum-region is created by marginalizing variables during our compilation assumption. Therefore, to answer task (i) of SPN decompilation, we suggest the set consists of one LV per sum-region.

Example 4.

In the SPN of Figure 1 (subfig:spn_hmm), we suggest three LVs, namely, , one per sum-region.

We now turn our attention to task (ii) of SPN decompilation, that is, constructing an I-map over and . Augment the SPN as done in [Peharz:2016wl]. However, before continuing, we need to correct the notion of a conditioning sum node for the following reason. Consider sum node in the SPN of Figure 1 (subfig:spn_hmm). [Peharz:2016wl] would not define sum node as a conditioning sum node for , even though would appear as a conditioning variable for in the CPT , as depicted in the constructed I-map in Figure 2 (subfig:peharz_hmm).

Definition 7.

An ancestor sum node of a node in an augmented SPN is called conditioning, if at least one child of reaches a different set of twins for .

Example 5.

Consider sum node in the SPN of Figure 1 (subfig:spn_hmm). Sum node is conditioning w.r.t. since the left-most child of reaches , but the right-most child would reach a twin of in the augmented SPN. Node is not conditioning for , since all children of reach the same set of twins for in the augmented SPN.

In Example 5, observe that is not a conditioning sum node for and hence does not appear as parent of in our constructed I-map in Figure 2 (subfig:us_hmm).

The SPN decompilation techniques described thus far are formalized as Algorithm 2.

1:procedure SPN2BN()
2:      Phase (i) suggests a set of LVs
3:     Let be a list of sum-layers in
4:     for each node in  do
5:          A mapping from SPN node to suggested LV
7:      Scopes
8:     for each in  do
9:         for each node in  do
10:              if  is not in  then
11:                  Create new LV
13:              else
14:                  Let be the same LV created for               
16:      Phase (ii) produces an I-map over and
17:     for each node in  do
18:         if  is a sum or leaf node then
19:              for each sum node in  do
20:                  if  is conditioning w.r.t.  then
21:                       Add edge to                                               
22:     return
Algorithm 2 SPN Decompilation
Example 6.

Given the SPN in Figure 1 (subfig:spn_hmm) as input to SPN2BN(), the decompiled BN is given in Figure 2 (subfig:us_hmm).

4 Theoretical Foundation

In this section, we first establish important properties of both SPN decompilation phases. Later, we show a favorable characteristic of our compilation assumption and Algorithm 2.

4.1 On SPN Decompilation

Our decompilation algorithm is parsimonious with the introduction of LVs. LVs are assigned to a group of sum nodes within a region, rather than one LV per sum node.

Regarding the I-map construction, we first show the correctness of the I-map, and then establish that the constructed I-map is minimal.

The I-map correctness follows from the CPT construction suggested in [Peharz:2016wl]. Theorem 1 in [Peharz:2016wl] shows conditional independencies among LVs, which allows us to derive an I-map encoding such independencies. More specifically, it describes CPT values for the CPT of a LV conditioned on all of its LV ancestors . However, the I-map we construct in Algorithm 2 implies the same CPT values, except for the CPT , where are those conditioning nodes defined by Definition 7. Since , the independencies represented in the I-map of [Peharz:2016wl] are a subset of those in the I-map built by Algorithm 2. The proof for these new conditional independencies and, thus, the correctness of our I-map, is formalized in Lemma 1.

Lemma 1.

Consider an augmented SPN with a sum node . Let be all of ’s ancestors and all of ’s conditioning sum nodes. Then, .


Let be the non-conditioning sum nodes of . That is, . By Definition 7, for every , all ’s children reach the same set of twins for . That is, conditioning on any does not change the value of . By definition, is independent of given , meaning . ∎

We next show that our constructed I-maps are minimal.

Theorem 1.

Given an augmented SPN over RVs and LVs , the I-map built by Algorithm 2 is minimal.


In a minimal I-map, if an edge is removed, then d-separation [pear88] will read an independence that does not hold in the JPD. Consider the I-map constructed by Algorithm 2 from an augmented SPN . By contradiction, assume that is not minimal, that is, an edge can be removed. In the I-map , d-separation would read an independence between and . We show next that this conditional independence does not hold in the JPD encoded by .

By line 2 of Algorithm 2, is a conditioning sum node of . By Definition 2, there exist two children and of such that , where is the set union of and its twins. Then, by construction, in the JPD encoded by over and , the event of selecting child , that is, , has a different outcome than selecting child , that is, . That is,

Thus, there is no conditional independence between and . A contradiction to the assumption of not being minimal. Therefore, must be minimal. ∎

One seeks minimal I-maps because non-minimal I-maps are not necessarily useful in practice [darwiche09, pear88, koll09].

4.2 Compilation and Decompilation

In this section, we first show that the compilation-decompilation algorithm, called BN2SPN2BN, constructs a unique BN for a given set of original BNs. A consequence of this is that BN2SPN2BN is idempotent.

Algorithm 3, called BN2SPN2BN, formalizes the process of compiling and decompiling a BN by applying BN2SPN followed by SPN2BN.

1:procedure BN2SPN2BN()
2:      Algorithm 1
3:      Algorithm 2
4:     return
Algorithm 3 BN compilation and decompilation

The next example shows that the BN output by BN2SPN2BN can be different than the original BN.

Example 7.

Consider the BN in Figure 3 (subfig:bn_recovery) as input to BN2SPN2BN. The BN output by BN2SPN2BN through compilation and decompilation is given in Figure 3 (subfig:bn_prim_recovery).

Figure 3: The recovery of the BN in (subfig:bn_recovery) by Algorithm 3 adds moral edge and higher moral edge .

Notice that the directed edges of the original BN are a subset of those in the output BN.

In the reminder of this section, we assume is a fixed topological ordering of a given BN .

Definition 8.

A directed moralization edge is a directed edge added between two non-adjacent vertices and in a given BN whenever there exists a variable such that and , where .

Example 8.

Recall the BN in Figure 3 (subfig:bn_recovery). Let be the fixed topological ordering . Then, is a directed moralization edge in Figure 3 (subfig:bn_prim_recovery).

Definition 9.

A directed ancestral moralization edge is a directed edge added between two non-adjacent vertices and in a given BN whenever there exists a variable such that and , and .

Example 9.

Recall again the BN in Figure 3 (subfig:bn_recovery) and the fixed topological ordering . Since, and , is a directed ancestral moralization edge in Figure 3 (subfig:bn_prim_recovery).

By definition, directed moralization edges are also directed ancestral moralization edges. We can now introduce the key notion of moral closure.

Definition 10.

Given a BN and a fixed topological order of , the moral closure of , denoted , is the unique BN formed by augmenting with all directed ancestral moralization edges.

Example 10.

Consider the BN in Figure 3 (subfig:bn_recovery) and the fixed topological ordering . The moral closure of is depicted in Figure 3 (subfig:bn_prim_recovery).

We are now ready to present the first main result of our compilation-decompilation process.

Theorem 2.

Given a BN and a fixed topological order of , the output of the compilation-decompilation algorithm BN2SPN2BN is the moral closure of .


(Crux). BN2SPN2BN is formed by BN2SPN followed by SPN2BN. We need to show where the directed ancestral moralization edges are introduced.

In BN2SPN, compiling a BN to an AC in line 4 uses the procedure described in [darwiche09], which pragmatically executes the summing out of variables in . Now, as also discussed in [darwiche09], VE induces a graph with parents of a common child being married. That is, these will be the directed moral edges. Later, during elimination, VE induces a graph by adding fill-in edges among the neighbors of eliminated variables following . These will be the directed ancestral moralization edges.

In SPN2BN, the decompilation procedure follows the hierarchical oder of sum nodes in the SPN graph when building the BN. Thus, variables that are eliminated first in BN2SPN will appear in lower layers of the SPN. The connection of variables in the BN follows this hierarchy, which, by construction, builds all directed ancestral moralization edges from BN2SPN. ∎

Theorem 2 has a couple of important consequences. As Theorem 2 establishes that the output of BN2SPN2BN is the moral closure of the input BN , it immediately follows that the output BN is exactly the input BN whenever no directed ancestral moralization edges are added to . One situation where this occurs is when does not have any v-structures, such as in the case of HMMs. Here, , so the output BN of BN2SPN2BN is the same as the input (up to a relabelling of variables). For example, recall the HMM in Figure 1 (subfig:bn_hmm). BN2SPN2BN yielded back the same BN as illustrated in Figure 2 (subfig:us_hmm). A second important case is when the input is itself. This leads to out next result showing that our compilation-decompilation process is idempotent.

Theorem 3.

BN2SPN2BN is idempotent.


Let be a BN and a fixed topological ordering of . By Theorem 2,


By definition, the moral closure of is itself. Thus,


Therefore, BN2SPN2BN is idempotent. ∎

The idempotent characteristic of BN2SPN2BN is useful in practice, since it guarantees the decompiled BN size as the moral closure size. In contrast, if we change the decompilation method to [zhao2015relationship] or [Peharz:2016wl], then applying BN2SPN2BN over and over will always yield a larger and larger BN.

It is perhaps worth mentioning here that directed concepts used in this section have undirected counterparts in the literature. The notion of a directed moralization edge corresponds to an undirected edge added during the process of moralizing a BN [jens88]. Furthermore, directed ancestral moralization edges correspond to fill-in edges added between non-adjacent neighbours in an undirected graph when triangulating a BN [darwiche09, pear88, koll09]. In fact, it can be seen that a reverse topological ordering of the moral closure of a BN yields a perfect numbering. That is, eliminating variables does not add any fill-in edges which means that (the skeleton of) is triangulated.

5 Synthetic Experiments

In this section, we first reaffirm our result in Theorem 2 with systematic experiments on a number of synthetic BNs. Next, we show how our decompilation method SPN2BN can enhance interpretability of an SPN learned from data.

5.1 Empirical Analysis on Moral Closure

We empirically tested the result in Theorem 2 by verifying whether the output BN of BN2SPN2BN was the moral closure of the input BN .

For a fixed number of variables, we exhaustively construct all possible connected BNs. That is, we discard disconnected BNs as well as cases not forming a DAG.

Table 1 describes our experiments. The number of variables ran from 2 through 7 inclusive. The number of BNs is given in the second line. The total number of possible elimination orderings is given in the third line. Finally, the last line reports the total number of trials that were conducted. In each and every case, our compilation-decompilation process returned the moral closure of the input BN .

2 3 4 5 6 7
# BNs 1 3 21 315 9.8K 615K
# elim ord 1 2 6 24 120 720
# trials 1 6 126 7.6K 1.1M 448M
Table 1: Synthetic benchmark BNs. Numbers rounded

5.2 SPN Decompilation and Interpretability

Here, we want to demonstrate how the decompilation algorithm can be used to better understand an SPN learned from data. Let us we decompile the well-known SPN structure, called a region graph, first suggested in [poon2011sum].

The SPN structure in [poon2011sum] leverages local structure in an image dataset. The idea is to select all rectangular regions, with the smallest regions corresponding to pixels. Subsequently, for each rectangular region, all possible ways to decompose it into two rectangular subregions are considered. This recursive procedure yields a valid SPN structure formed by layers of sum and product nodes known as a region graph [dennis2012learning, peharz2018probabilistic]. By construction, given the height and width of an image, a region graph is unique.

We ran experiments decompiling some fixed-size region graphs. Consider an image of size 3-by-3. The region graph for such an image is depicted in Figure 4 (subfig:spn_pnd). Use SPN2BN to decompile the region graph into the BN in Figure 4 (subfig:bn_pnd).

Figure 4: An SPN structure (region graph) for a 3-by-3 image in (subfig:spn_pnd). In (subfig:bn_pnd), the decompiled BN using Algorithm 2.

We can draw independence conclusions for the BN of Figure 4 (subfig:bn_pnd) using d-separation. For instance, observing variables in the second layer (from bottom to top) of

renders some variables of the third layer dependent on each other. These dependencies can help the interpretability of SPN applications such as an autoencoder. In

[Vergari2018qu], visualization of the latent space shows that the deeper the SPN layer, the higher the level of complexity being learned. For instance, [Vergari2018qu] considers the mnist dataset of handwriting digits. Visualizing the first layers of an SPN learned on mnist shows primitive drawings such as circular and linear forms, while later layers show pieces of digits. Thus, a decompiled BN for this SPN encodes independencies between the circular/linear forms and the pieces of digits.

6 Conclusion

There is a trade-off between the high-level interpretability of classical probabilistic graphical models, such as Bayesian networks (BNs), and the inference performance of recent tractable probabilistic models, such as sum-product networks (SPNs). One way of improve performance of BNs is to compile them into SPNs, a well understood technique due to Darwiche’s seminal work on arithmetic circuits (ACs) compilation. The converse direction of SPN decompilation into BNs has received rather limited attention.

In this paper, we formalize SPN decompilation by suggesting SPN2BN, an algorithm that converts an SPN into a BN. SPN2BN is an improvement over the only two other approaches in the literature addressing the connection between BNs and SPNs, namely, the works [zhao2015relationship] and [Peharz:2016wl]. First, SPN2BN produces a minimal BN, meaning no unnecessary dependency assumptions are made. Second, both [zhao2015relationship] and [Peharz:2016wl] are excessive with the number of introduced latent variables.

Our decompilation method, SPN2BN, assumed that a given SPN was compiled from an unknown BN using the BN inference algorithm, called variable elimination (VE) using reverse topological ordering. This compilation assumption is called BN2SPN.

One key result of our SPN decompilation, formalized in Theorem 2, is that it constructs the moral closure of the original BN . This means that in certain cases like for HMMs, where the moral closure of a BN is itself, our SPN decompilation will return the original BN. Theorem 2 also implies that there is a set of BNs related to the original BN that will each return . That is, any BN formed from by adding some directed ancestral moralization edges will yield . Finally, Theorem 3 establishes that our compilation-decompilation process is idempotent. The reason is that the moral closure of is itself. Theorem 3 has practical significance because it limits the maximum size of the decompiled BN to be the size of the moral closure of the input BN. In contrast, if we change the decompilation method to [zhao2015relationship] or [Peharz:2016wl], then applying BN2SPN2BN over and over will always yield a larger and larger BN.


This research is partially supported by NSERC Discovery Grant 238880 and has also received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 797223 — HYBSPN.