# Smoothing Structured Decomposable Circuits

We study the task of smoothing a circuit, i.e., ensuring that all children of a plus-gate mention the same variables. Circuits serve as the building blocks of state-of-the-art inference algorithms on discrete probabilistic graphical models and probabilistic programs. They are also important for discrete density estimation algorithms. Many of these tasks require the input circuit to be smooth. However, smoothing has not been studied in its own right yet, and only a trivial quadratic algorithm is known. This paper studies efficient smoothing for structured decomposable circuits. We propose a near-linear time algorithm for this task and explore lower bounds for smoothing general circuits, using existing results on range-sum queries. Further, for the important special case of All-Marginals, we show a more efficient linear-time algorithm. We validate experimentally the performance of our methods.

## Authors

• 7 publications
• 54 publications
• 8 publications
• 17 publications
• ### Tractable Regularization of Probabilistic Circuits

Probabilistic Circuits (PCs) are a promising avenue for probabilistic mo...
06/04/2021 ∙ by Anji Liu, et al. ∙ 0

• ### Probabilistic Circuits for Variational Inference in Discrete Graphical Models

Inference in discrete graphical models with variational methods is diffi...
10/22/2020 ∙ by Andy Shih, et al. ∙ 0

• ### Probabilistic Generating Circuits

Generating functions, which are widely used in combinatorics and probabi...
02/19/2021 ∙ by Honghua Zhang, et al. ∙ 0

• ### Winner-Take-All as Basic Probabilistic Inference Unit of Neuronal Circuits

Experimental observations of neuroscience suggest that the brain is work...
08/02/2018 ∙ by Zhaofei Yu, et al. ∙ 0

• ### Linear Time Computation of Moments in Sum-Product Networks

Bayesian online algorithms for Sum-Product Networks (SPNs) need to updat...
02/15/2017 ∙ by Han Zhao, et al. ∙ 0

• ### Luby--Veličković--Wigderson revisited: Improved correlation bounds and pseudorandom generators for depth-two circuits

We study correlation bounds and pseudorandom generators for depth-two ci...
03/12/2018 ∙ by Rocco A. Servedio, et al. ∙ 0

• ### Strudel: Learning Structured-Decomposable Probabilistic Circuits

Probabilistic circuits (PCs) represent a probability distribution as a c...
07/18/2020 ∙ by Meihua Dang, et al. ∙ 9

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Circuits are directed acyclic graphs that are used throughout logical and probabilistic inference. Their structure captures the computation of reasoning algorithms. In the context of machine learning, state-of-the-art algorithms for exact and approximate inference in discrete probabilistic graphical models

(Chavira and Darwiche, 2008; Kisa et al., 2014; Friedman and Van den Broeck, 2018) and probabilistic programs (Fierens et al., 2015; Bellodi and Riguzzi, 2013) are built on circuit compilation. As well, learning tractable circuits is the current method of choice for discrete density estimation (Gens and Domingos, 2013; Rooshenas and Lowd, 2014; Vergari et al., 2015; Liang et al., 2017)

. Circuits are also used to enforce logical constraints on deep neural networks

(Xu et al., 2018).

Most of the probabilistic inference algorithms on circuits actually require the input circuit to be smooth (also referred to as complete(Sang et al., 2005; Poon and Domingos, 2011). The notion of smoothness was first introduced by Darwiche (2001) to ensure efficient model counting and cardinality minimization and has since been identified as essential to probabilistic inference algorithms. Yet, to the best of our knowledge, no efficient algorithm to smooth a circuit has been proposed beyond the original quadratic algorithm by Darwiche (2001).

The quadratic complexity can be a major bottleneck, since circuits in practice often have hundreds of thousands of edges when learned, and millions of edges when compiled from graphical models. As such, in the latest Dagstuhl Seminar on “Recent Trends in Knowledge Compilation”, this task of smoothing a circuit was identified as a major research challenge (Darwiche et al., 2017). Therefore, a more efficient smoothing algorithm will increase the scalability of circuit-based inference algorithms.

Intuitively, smoothing a circuit amounts to filling in the missing variables under its -gates. In Figure 0(a) we see that the -gate does not mention the same variables on its left side and right side, so we fill in the missing variables by adding tautological gates of the form , resulting in the smooth circuit in Figure 0(b)

. Filling in these missing variables is necessary for probabilistic inference tasks such as computing marginals, computing probability of evidence, sampling, and approximating Maximum A Posteriori

(Sang et al., 2005; Chavira and Darwiche, 2008; Friesen and Domingos, 2016; Friedman and Van den Broeck, 2018; Mei et al., 2018). The task of smoothing was also explored by Peharz et al. (2017), where they look into preserving smoothness when augmenting Sum-Product Networks for computing Most Probable Explanation.

In this paper we propose a more efficient smoothing algorithm. We focus on the commonly used class of structured decomposable circuits, which include structured decomposable Negation Normal Form, Sentential Decision Diagrams, and more (Pipatsrisawat and Darwiche, 2008; Darwiche, 2011). Intuitively, such circuits must always consider their variables in a certain way, which is formalized as a tree structure on the variables called a vtree.

Our first contribution (Section 4) is to show a near-linear time algorithm for smoothing such circuits, which is a clear improvement on the naive quadratic algorithm. Specifically, our algorithm runs in time proportional to the circuit size multiplied by the inverse Ackermann function of the circuit size and number of variables111The inverse Ackermann function is defined in Tarjan (1972). As the Ackermann function grows faster than any primitive recursive function, the function grows slower than the inverse of any primitive recursive function, e.g., slower than . (Theorem 3).

Our second contribution (Section 5) is to show a lower bound of the same complexity, on smoothing general circuits for the restricted class of smoothing algorithms that we call smoothing-gate algorithms (Theorem 5). Intuitively, smoothing-gate algorithms are those that retain the structure of the original circuit and can only make them smooth by adding new gates to cover the missing variables. This natural class corresponds to the example in Figure 1 and our near-linear time smoothing algorithm also falls in this class. We match its complexity and show a lower bound on the performance of any smoothing-gate algorithm, relying on known results in the field of range-sum queries.

Our third contribution (Section 6) is to focus on the probabilistic inference task of All-Marginals and to propose a novel linear time algorithm for this task which bypasses the need for smoothing, assuming that the weight function supports all four elementary operations of (Theorem 6). These results are summarized in Table 1.

Our fourth contribution (Section 7) is to study how to make a circuit smooth while preserving structuredness. We show that we cannot achieve a sub-quadratic smoothing algorithm if we impose the same vtree structure on the output circuit (Prop. 7), unless the vtree has low height (Prop. 8).

Our final contribution (Section 8) is to experiment on smoothing and probabilistic inference tasks. We evaluate the performances of our smoothing and of our linear time All-Marginals algorithm.

The rest of the paper is structured as follows. In Section 2 we review the necessary definitions, and in Section 3 we motivate the task of smoothing in more detail. We then present each of our five contributions in order in Sections 4567 and 8. We conclude in Section 9.

## 2 Background

Let us now define the model of circuits that we study (refer again to Figure 1 for an example):

###### Definition 1.

A logical circuit is a rooted directed acyclic graph where leaves are variables, and internal gates perform disjunction (-gates) or conjunction (-gates). An arithmetic circuit is one where leaves are numeric constants or variables, and internal gates perform addition (-gates) or multiplication (-gates). The children of an internal gate are the gates that feed into it. We impose that the circuit has been preprocessed in linear time to ensure that each -gate has 0 or 2 inputs.

We focus on circuits that are decomposable, and more precisely which are structured.

###### Definition 2.

For any gate , we call the set of variables that appears at or below gate . A circuit is decomposable if these sets of variables are disjoint between the two children of every -gate. Formally, for every -gate with children and , we have .

We then define structuredness, by introducing the notion of vtree on a set of variables:

###### Definition 3.

A vtree on a set of variables is a full binary tree whose leaves have a one-to-one correspondence with variables in . We denote the set of variables under a vtree node as .

###### Definition 4.

A circuit respects a vtree if there is a mapping from its gates to such that:

• For every variable , the node is mapped to the leaf of corresponding to .

• For every -gate and child of , the node is or a descendant of in .

• For every -gate with children , letting and be the left and right children of , the node is or a descendant of and is or a descendant of .

A circuit is structured decomposable if it respects some vtree . The circuit is also decomposable.

Structured decomposability was introduced in the context of logical circuits, and it is also enforced in Sentential Decision Diagrams, a widely used tractable representation of Boolean functions (Darwiche, 2011). This property allows for a polytime conjoin operation on logical circuits (Pipatsrisawat and Darwiche, 2008). For circuits that represent distributions, structured decomposability allows multiplication of these distributions (Shen et al., 2016) and efficient computation of the KL-divergence between two distributions (Liang and Van den Broeck, 2017).

Next, we review another property of circuits that will be relevant for probabilistic inference tasks (Choi and Darwiche, 2017).

###### Definition 5.

A circuit is deterministic if the children of each -gate are pairwise logically disjoint.

In the rest of this paper, we will let denote the number of variables in a circuit and let denote the size of a circuit, measured by the number of edges in the circuit.

## 3 Smoothing

We focus on the probabilistic inference tasks of weighted model counting and computing All-Marginals (Sang et al., 2005; Chavira and Darwiche, 2008). We will refer to weighted model counting as its more general form of Algebraic Model Counting (AMC) (Kimmig et al., 2016). To describe these tasks, we define knowledge bases and models.

###### Definition 6.

Given a set of variables , a set of instantiations of is called a knowledge base, and each element of is called a model.

The task of AMC on a knowledge base and a weight function is to compute from Equation 1. The task of All-Marginals is to compute the partial derivative of with respect to the weight of each variable. The weights are usually defined over a semiring, an important distinction we highlight later.

On probabilistic models, is often the partition function or the probability of evidence, where the partial derivatives of these quantities correspond to all (conditional) marginals in the distribution. Computing All-Marginals efficiently significantly speeds up probabilistic inference, and is used as a subroutine in the collapsed compilation algorithm in our later experiments.

 s=⨁x∈f⨂x∈xw(x)AMC (1)
 {∂s∂w(x),∂s∂w(−x)∣∣X∈X}All-Marginals (2)

These tasks are difficult in general, unless we have a tractable representation of the knowledge base . The following fact highlights the importance of smoothing. If is represented as a logical circuit that is only deterministic and decomposable but not smooth, then there is in general no known technique to compute the AMC and All-Marginals tasks in linear time. If is represented as a logical circuit that is deterministic, decomposable and smooth, then the AMC and All-Marginals tasks can be completed in time . For example, the AMC task is done by converting the deterministic, decomposable and smooth logical circuit into an arithmetic circuit, attaching the weights of the variables as numeric constants in the circuit, and then evaluating the circuit.

Given the necessity of smoothness for efficiently computing these inference tasks, we are interested in studying the complexity of smoothing a circuit. To do so, we formally define the task of smoothing.

###### Definition 7.

Two logical circuits on variables are equivalent if they evaluate to the same output on any input .

###### Definition 8.

A circuit is smooth if for every pair of children and of a -gate, . The task of smoothing a logical circuit is to output a smooth logical circuit that is equivalent to .

Note that we are only defining the smoothing task over logical circuits. This is because the probabilistic inference tasks are performed by smoothing a logical circuit and then converting it into an arithmetic circuit, so it is easier for the reader to only consider smoothing on logical circuits. For the rest of the paper, we will refer to logical circuits simply as circuits.

When the weight function allows division, there exists a renormalization technique that can compute the AMC in linear time without smoothing the initial circuit (Kimmig et al., 2016). However, this restriction is limiting, since even if the weight function is defined over a field, division by zero may raise an issue. For example, in practice division by zero may be unavoidable (Van den Broeck et al., 2014) or the weight function may be defined over a semiring (Friesen and Domingos, 2016), in which case there is no known technique to bypass smoothing. As such, we explore efficient smoothing algorithms in Sections 45.

On the other hand, one may still be interested in settings where all four elementary operations of on the weight function are allowed. To this end, we also propose in Section 6 a novel technique that computes All-Marginals in linear time in this relaxed setting.

## 4 Smoothing Algorithm

We present our algorithm on smoothing structured decomposable circuits, based off of semigroup range-sum literature. First, we define a class of common strategies to smoothing a circuit, which encompasses both the previously-known algorithm and our new algorithm.

The existing quadratic algorithm on smoothing a circuit goes to each -gate and inserts missing variables one by one (Darwiche, 2001). This algorithm retains the original gates of the circuit, and adds additional gates to fill in missing variables. We will define smoothing-gate algorithms as the family of smoothing algorithms that retain the original gates of the circuit.

###### Definition 9.

Edge contraction is the process of removing each -gate or -gate with a single child, and feeding the child as input to the parents of the removed gate.

###### Definition 10.

Two circuits and with gate sets and are isomorphic if there exists a bijection between their gates such that the following conditions hold.

1. For any gate , is the same type of gate as .

2. For any gate and child of , the gate is a child of in .

3. For any gate and child of , the gate is a child of in .

4. The root of maps to the root of .

An algorithm is a smoothing-gate algorithm if for any edge-contracted input circuit , the output circuit has a subcircuit that is isomorphic to after edge contraction.

###### Definition 11.

A circuit is called a smoothing gate if it is equivalent to the circuit from some .

Smoothing-gate algorithms are very intuitive, since the entire task boils down to the efficient computation of a smoothing gate given a set of missing variables . The structure of is not specified, and the only requirement is that it is equivalent to . The quadratic algorithm constructs by naively conjoining each variable in one at a time, leading to a linear amount of work per gate. In the case of structured decomposable circuits, we can do much better.

###### Lemma 1.

Consider a structured decomposable circuit, and let be the sequence of its variables written following the in-order traversal of its vtree. For any two vtree nodes , we have that can be written as the union of at most two contiguous intervals in .

###### Proof.

Since is a binary tree, the in-order traversal of visits the variables of consecutively, and the variables of consecutively. Hence, and can each be written as a contiguous interval, and can be written as the union of at most two contiguous intervals. ∎

We smooth a circuit in one bottom-up pass. If is a leaf -gate, replace it with . If is an internal -gate, letting and be the children of and respectively, replace with and with . If is a -gate, replace each child with . By Lemma 1, each can be built with two gates of the form , where is a continuous interval in . Thus, we can appeal to results from semigroup range-sums.

### Semigroup Range-Sum.

The semigroup range-sum problem considers variables, intervals , and a weight function over the variables. The task is to compute the sum of the weights of the variables in each interval, i.e. for all  (Yao, 1982; Chazelle and Rosenberg, 1989). Since is only defined over a semigroup, subtraction is not supported. That is, we cannot follow the efficient strategy of precomputing and outputting . Still, there is an efficient algorithm to compute all the required sums in time , where is the inverse Ackermann function. We restate their result here.

###### Theorem 2.

Given variables defined over a semigroup and intervals, the sum of each interval can be computed in time  (Chazelle and Rosenberg, 1989).

Our smoothing task can be reduced to the semigroup range-sum problem as follows. Smoothing a structured decomposable circuit of size reduces to constructing smoothing gates for intervals. We pass these intervals as input to the range-sums algorithm, which will then generate a sequence of additions that computes the sum of each interval. Each addition in the sequence will add two previously pre-computed sums.

We trace this sequence of additions (see Figure 2). For the base case of , let be the gate . Then for each addition , we construct a corresponding -gate . A sum of an interval then maps to a gate that is a smoothing gate for that interval. This process of smoothing a structured decomposable circuit leads to the following theorem.

###### Theorem 3.

The task of smoothing a structured decomposable circuit has time complexity , where is the number of variables and is the size of the circuit.

## 5 Lower Bound

In this section we show a lower bound on the task of smoothing a general circuit, for the family of smoothing-gate algorithms. Chazelle and Rosenberg (1989) show a lower bound on semigroup range-sums, as we state here, but more work is needed to successfully leverage their results.

###### Theorem 4.

Given variables defined over a semigroup, there exists a set of intervals of the weights, such that computing the sum of each interval takes number of additions Chazelle and Rosenberg (1989).

We cannot immediately assert the same lower bound for smoothing general circuits, for two reasons. First, we must pose the sum of each of the intervals as a smoothing problem in time. Second, we must show that no smoothing algorithm is more efficient than smoothing-gates algorithms. We address the first issue, but leave the second open.

###### Theorem 5.

For the class of smoothing-gate algorithms, the task of smoothing a general circuit has space complexity , where is the number of variables and is the size of the circuit.

###### Proof.

Take any set of intervals, with . For each interval , construct the gate , which is done by first constructing prefix gates and suffix gates in linear time, and then constructing each gate in constant time. Next, let be the circuit , where . We need to show that running a smoothing-gate algorithm on is as hard as computing the sum of each interval in .

Since has a top-level -gate with children , and mentions all variables, each gate also needs to mention all variables to satisfy smoothness. By the construction of , it is missing exactly the variables . We will show that constructing the smoothing gates for all is as hard as solving the semigroup range-sum problem on those intervals, by mapping the -operation in the semigroup range-sum problem to the -gates in our circuits.

In particular, consider a smooth circuit that contains the smoothing gates for all . We use the following relabelling scheme to remove all the -gates. For every -gate of , take one of its input wires and reroute a copy of it to each gate that feeds into. Each remaining -gate is now the product of all the variables that was mentioned by its corresponding gate in the original circuit , so each gate implicitly contained a -gate of the variables . This relabelling scheme shows that every must implicitly be computing the -gate of . By setting the inputs to the circuits to be the value of the weights in the range-sum problem, and evaluating the circuits treating as addition, the value to which each gate evaluates is the requested sum. So, the circuit describes a sequence of additions to compute the sum of each interval. We then apply Theorem 4, which implies that the bound of applies to the size of the smooth circuit . ∎

## 6 Computing All-Marginals

In this section we propose an optimization to the special case of computing All-Marginals on a deterministic and structured decomposable circuit. The goal is to compute the partial derivative of the AMC with respect to the weight of each variable (Equation 2 in Section 3). Recall that computing All-Marginals on a deterministic, decomposable and smooth circuit takes time linear in the size of the circuit. Therefore, using the techniques in Section 4, we can smooth a deterministic and structured decomposable circuit and then convert it into an arithmetic circuit to compute All-Marginals, all in time . For the relaxed setting where the weight function also supports division and subtraction, we propose an even more efficient method to compute All-Marginals that bypasses the smoothing process. Our method takes time , which is not only optimal but also avoids the messy construction of smoothing gates.

The algorithm is a form of backpropagation, and goes as follows (Algorithm

1). First, we compute the AMC using a linear bottom-up pass over the circuit. During this process, we keep track of the AMC of each internal gate. Next, we traverse the circuit top-down in order to compute the partial derivative of each gate. At a -gate or -gate, we propagate the partial derivative down to the children as needed. However, since the circuit is not smooth, there may be missing variables in the children of -gates, in which case the propagation is incomplete. The challenge is to efficiently complete the propagation to the missing variables.

###### Theorem 6.

The All-Marginals task on a deterministic and structured decomposable circuit and a weight function that supports has time complexity , where is the number of variables and is the size of .

###### Proof.

Recall from Lemma 1 that the set of missing variables of each parent-child pair forms at most two contiguous intervals with respect to the in-order traversal of the vtree. The idea now is that propagating the partial derivative to each interval amounts to a range increment, i.e., incrementing a quantity for each variable in the interval. The naive algorithm takes quadratic time to do this for all intervals, but there is a more efficient method to perform all range increments in linear time.

Consider an integer , a set of intervals (), and numeric constants . For each integer , we wish to compute the sum . That is, if belongs to some interval , then we increase by . The trick is to keep track of delta variables . For each interval , we increase by and decrease by . Finally, we output and . This process can be done in time . ∎

## 7 On Retaining Structuredness

The property of structured decomposability allows for a polytime conjoin operation, multiplication of distributions, and more (see Section 2). For downstream tasks such as computing AMC or All-Marginals, structuredness is not required. Since these downstream tasks are performed after the conjoin/multiply operations, our smoothing algorithm does not sacrifice much, if at all, by losing structuredness. One could also keep a copy of the original circuit if structuredness is needed later on.

Nevertheless, the reason our smoothing algorithm does not retain structuredness is that it interferes with the efficient construction of smoothing gates (Definition 11). In fact, we can show that any smoothing algorithm that maintains the same vtree structure must run in quadratic time.

###### Proposition 7.

The task of smoothing a structured decomposable circuit that enforces the same vtree has space complexity , where is the number of variables and is the size of .

###### Proof.

We consider a right-linear vtree with variables , in that order. For simplicity, let be a multiple of , and consider the following functions for :

 Jy=⨂n/3i=1β(i,y)xiKy=⨂ni=2n/3+1β(i,y)xi

where if the -th bit of the binary representation of is set, and otherwise.

Next, consider . An instantiation satisfies if all its literals are negative, or if the sign of its literals from (in order) equals those from , and are not all negative. We can build a circuit with size that respects and computes using an Ordered Binary Decision Diagram representation (Bryant, 1986). Yet, any smooth circuit that respects and computes has size , as we see next.

Let the depth of an internal gate be , where is the length of the path from the root of to . We use the notion of a certificate on a circuit, as defined by Bova et al. (2014). Since is smooth, every certificate of must have literals. Let be an instantiation satisfying and certificate , and let be an instantiation satisfying and certificate , with and . Since any instantiation satisfying for is not a model of , it follows that and must not share any internal gates from depth to depth . So, has size . ∎

In some cases, it is possible to do better: for instance, when the vtree has low height.

###### Proposition 8.

The task of smoothing a structured decomposable circuit that enforces the same vtree has time complexity , where is the height of the vtree and is the size of .

###### Proof.

We construct smoothing gates by following the structure of the vtree: for each vtree node with children and , we build in constant time a structured smoothing gate for the variables that are descendants of , using the smoothing gate for the variables that are descendants of the one for the variables that are descendants of . Now, we can use these gates to smooth the circuit: any interval of variables in the in-order traversal of the vtree can be written as intervals corresponding to vtree nodes, so smoothing has time complexity . ∎

## 8 Experiments

We experiment on our smoothing algorithm in Section 4 and our All-Marginals algorithm in Section 6. Experiments were run on a single Intel(R) Core(TM) i7-3770 CPU with 16GB of RAM.

### Smoothing Circuits.

We first study the smoothing task on structured decomposable circuits using our new smoothing algorithm (Section 4), which we compare to the naive quadratic smoothing algorithm. We construct hand-crafted circuits for which many smoothing gates are required, each of which covers a large interval. In particular, we pick large intervals and for each interval we construct the structured gate for a balanced vtree. Then we take each and feed them into one top-level -gate. This triggers the worst-case quadratic behavior of the naive smoothing algorithm, while our new algorithm has near-linear behavior.

The speedup of our smoothing algorithm is captured in Table 2(a). The Size column reports the size of the circuit. The Naive column reports the time taken by the quadratic smoothing algorithm, the Ours column reports the same value using our near-linear algorithm, and the Improve column reports the relative decrease in time. The values are averaged over runs.

### Collapsed Sampling.

We next benchmark our method for computing All-Marginals in Section 6 on the task of collapsed sampling, which is a technique for probabilistic inference on factor graphs. The collapsed sampling algorithm performs approximate inference on factor graphs by alternating between knowledge compilation phases and sampling phases (Friedman and Van den Broeck, 2018). In the sampling phase, the algorithm computes All-Marginals as a subroutine.

We replace the original quadratic All-Marginals subroutine by our linear time algorithm (Algorithm 1). The requirement of the operations for Algorithm 1 is satisfied since the weight function is defined over the reals and in the experiments by Friedman and Van den Broeck (2018). In Table 2(b) we report the results on the Segmentation-11 network. Results for other networks used in Friedman and Van den Broeck (2018) were similar. We see a decrease in the number of operations needed for each All-Marginal computation. The Size column reports the size threshold during the knowledge compilation phase. The Naive column reports the number of operations using the original All-Marginals subroutine, the Ours column reports the same value using Algorithm 1, and the Improve column reports the relative decrease in operations. The values are averaged over runs.

## 9 Conclusion

In this paper we consider the task of smoothing a circuit. Circuits are widely used for inference algorithms for discrete probabilistic graphical models, and for discrete density estimation. The input circuits are required to be smooth for many of these probabilistic inference tasks, such as Algebraic Model Counting and All-Marginals. We provide a near-linear time smoothing algorithm for structured decomposable circuits and prove a matching lower bound within the class of smoothing-gate algorithms, for general circuits. We introduce a technique to compute All-Marginals in linear time without smoothing the circuit, when the weight function supports division and subtraction. As well, we show that smoothing a circuit while maintaining the same vtree structure cannot be sub-quadratic, unless the vtree has low height. Finally, we empirically evaluate our algorithms and show a speedup over the existing smoothing algorithm.

### Acknowledgments

This work is partially supported by NSF grants #IIS-1657613, #IIS-1633857, #CCF-1837129, DARPA XAI grant #N66001-17-2-4032, NEC Research, and gifts from Intel and Facebook Research.

## References

• Bellodi and Riguzzi [2013] Elena Bellodi and Fabrizio Riguzzi. Expectation maximization over binary decision diagrams for probabilistic logic programs. Intell. Data Anal., 17:343–363, 2013.
• Bova et al. [2014] Simone Bova, Florent Capelli, Stefan Mengel, and Friedrich Slivovsky. Expander cnfs have exponential dnnf size. CoRR, abs/1411.1995, 2014.
• Bryant [1986] Randal E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35:677–691, 1986.
• Chavira and Darwiche [2008] Mark Chavira and Adnan Darwiche. On probabilistic inference by weighted model counting. Artif. Intell., 172:772–799, 2008.
• Chazelle and Rosenberg [1989] Bernard Chazelle and Burton Rosenberg. Computing partial sums in multidimensional arrays. In Symposium on Computational Geometry, 1989.
• Choi and Darwiche [2017] Arthur Choi and Adnan Darwiche. On relaxing determinism in arithmetic circuits. In ICML, 2017.
• Darwiche et al. [2017] Adnan Darwiche, Pierre Marquis, Dan Suciu, and Stefan Szeider. Recent trends in knowledge compilation (Dagstuhl Seminar 17381). Dagstuhl Reports, 7:62–85, 2017.
• Darwiche [2001] Adnan Darwiche. On the tractable counting of theory models and its application to truth maintenance and belief revision. Journal of Applied Non-Classical Logics, 11:11–34, 2001.
• Darwiche [2011] Adnan Darwiche. SDD: A new canonical representation of propositional knowledge bases. In IJCAI, 2011.
• Fierens et al. [2015] Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Sht. Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs using weighted boolean formulas. TPLP, 15:358–401, 2015.
• Friedman and Van den Broeck [2018] Tal Friedman and Guy Van den Broeck. Approximate knowledge compilation by online collapsed importance sampling. In NeurIPS, December 2018.
• Friesen and Domingos [2016] Abram L. Friesen and Pedro M. Domingos. The sum-product theorem: A foundation for learning tractable models. In ICML, 2016.
• Gens and Domingos [2013] Robert Gens and Pedro M. Domingos. Learning the structure of sum-product networks. In ICML, 2013.
• Kimmig et al. [2016] Angelika Kimmig, Guy Van den Broeck, and Luc De Raedt. Algebraic model counting. International Journal of Applied Logic, November 2016.
• Kisa et al. [2014] Doga Kisa, Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Probabilistic sentential decision diagrams. In KR, 2014.
• Liang and Van den Broeck [2017] Yitao Liang and Guy Van den Broeck. Towards compact interpretable models: Shrinking of learned probabilistic sentential decision diagrams. In

IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI)

, August 2017.
• Liang et al. [2017] Yitao Liang, Jessa Bekker, and Guy Van den Broeck. Learning the structure of probabilistic sentential decision diagrams. In UAI, 2017.
• Mei et al. [2018] Jun Mei, Yong Jiang, and Kewei Tu. Maximum a posteriori inference in sum-product networks. In AAAI, 2018.
• Peharz et al. [2017] Robert Peharz, Robert Gens, Franz Pernkopf, and Pedro M. Domingos. On the latent variable interpretation in sum-product networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:2030–2044, 2017.
• Pipatsrisawat and Darwiche [2008] Knot Pipatsrisawat and Adnan Darwiche. New compilation languages based on structured decomposability. In AAAI, 2008.
• Poon and Domingos [2011] Hoifung Poon and Pedro M. Domingos. Sum-product networks: A new deep architecture.

2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)

, pages 689–690, 2011.
• Rooshenas and Lowd [2014] Amirmohammad Rooshenas and Daniel Lowd. Learning sum-product networks with direct and indirect variable interactions. In ICML, 2014.
• Sang et al. [2005] Tian Sang, Paul Beame, and Henry A. Kautz.

Performing bayesian inference by weighted model counting.

In AAAI, 2005.
• Shen et al. [2016] Yujia Shen, Arthur Choi, and Adnan Darwiche. Tractable operations for arithmetic circuits of probabilistic models. In NIPS, 2016.
• Tarjan [1972] Robert E. Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22:215–225, 1972.
• Van den Broeck et al. [2014] Guy Van den Broeck, Wannes Meert, and Adnan Darwiche. Skolemization for weighted first-order model counting. In KR, 2014.
• Vergari et al. [2015] Antonio Vergari, Nicola Di Mauro, and Floriana Esposito. Simplifying, regularizing and strengthening sum-product network structure learning. In ECML/PKDD, 2015.
• Xu et al. [2018] Jingyi Xu, Zilu Zhang, Tal Friedman, Yitao Liang, and Guy Van den Broeck.

A semantic loss function for deep learning with symbolic knowledge.

In ICML, 2018.
• Yao [1982] Andrew Chi-Chih Yao. Space-time tradeoff for answering range queries (extended abstract). In STOC, 1982.