Testing Probabilistic Circuits

by   Yash Pote, et al.

Probabilistic circuits (PCs) are a powerful modeling framework for representing tractable probability distributions over combinatorial spaces. In machine learning and probabilistic programming, one is often interested in understanding whether the distributions learned using PCs are close to the desired distribution. Thus, given two probabilistic circuits, a fundamental problem of interest is to determine whether their distributions are close to each other. The primary contribution of this paper is a closeness test for PCs with respect to the total variation distance metric. Our algorithm utilizes two common PC queries, counting and sampling. In particular, we provide a poly-time probabilistic algorithm to check the closeness of two PCs when the PCs support tractable approximate counting and sampling. We demonstrate the practical efficiency of our algorithmic framework via a detailed experimental evaluation of a prototype implementation against a set of 475 PC benchmarks. We find that our test correctly decides the closeness of all 475 PCs within 3600 seconds.



There are no comments yet.


page 1

page 2

page 3

page 4


Probabilistic Generating Circuits

Generating functions, which are widely used in combinatorics and probabi...

On the Relationship Between Probabilistic Circuits and Determinantal Point Processes

Scaling probabilistic models to large realistic problems and datasets is...

Learning and Testing Junta Distributions with Subcube Conditioning

We study the problems of learning and testing junta distributions on {-1...

Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits

Probabilistic circuits (PCs) are a promising avenue for probabilistic mo...

Learnability of the output distributions of local quantum circuits

There is currently a large interest in understanding the potential advan...

Classically Simulating Quantum Circuits with Local Depolarizing Noise

We study the effect of noise on the classical simulatability of quantum ...

HyperSPNs: Compact and Expressive Probabilistic Circuits

Probabilistic circuits (PCs) are a family of generative models which all...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Probabilistic modeling is at the heart of modern computer science, with applications ranging from image recognition and image generation Pope and Lowe (2000); Radford et al. (2015) to weather forecasting Cano et al. (2004). Probabilistic models have a multitude of representations, such as probabilistic circuits (PCs) Choi et al. (2020), graphical models Koller and Friedman (2009), generative networks Goodfellow et al. (2014), and determinantal point processes Kulesza and Taskar (2012). Of particular interest to us are PCs, which are known to support guaranteed inference and thus have applications in safety-critical fields such as healthcare Aronsky and Haug (1998); Oniśko et al. (2000). In this work, we will focus on PCs that are fragments of the Negation Normal Form (), specifically s, -s, s, and Darwiche and Huang (2002). We refer to the survey by Choi et al. (2020) for more details regarding PCs.

Given two distributions and , a fundamental problem is to determine whether they are close. Closeness between distributions is frequently quantified using the total variation (TV) distance, where is the norm Lin et al. (2018); Canonne et al. (2020). Thus, stated formally, closeness testing is the problem of deciding whether or for . Determining the closeness of models has applications in AI planning Darwiche and Huang (2002), bioinformatics Rahmatallah et al. (2014); Städler and Mukherjee (2015); Yin et al. (2015) and probabilistic program verification Dutta et al. (2018); Murawski and Ouaknine (2005).

Equivalence testing is a special case of closeness testing, where one tests if . Darwiche and Huang (2002) initiated the study of equivalence testing of PCs by designing an equivalence test for -s. An equivalence test is, however, of little use in contexts where the PCs under test encode non-identical distributions that are nonetheless close enough for practical purposes. Such situations may arise due to the use of approximate PC compilation Chubarian and Turán (2020) and sampling-based learning of PCs Peharz et al. (2020a, b)

. As a concrete example, consider PCs that are learned via approximate methods such as stochastic gradient descent 

Peharz et al. (2020b). In such a case, PCs are likely to converge to close but non-identical distributions. Given two such PCs, we would like to know whether they have converged to distributions close to each other. Thus, we raise the question: Does there exist an efficient algorithm to test the closeness of two PC distributions?

In this work, we design the first closeness test for PCs with respect to TV distance, called . Assuming the tested PCs allow poly-time approximate weighted model counting and sampling, runs in polynomial time. Formally, given two PC distributions and , and three parameters (,,), for closeness(), farness(), and tolerance(), returns if and if with probability at least . makes atmost calls to the sampler and exactly 2 calls to the counter.

builds on a general distance estimation technique of 

Canonne and Rubinfeld (2014) that estimates the distance between two distributions with a small number of samples. In the context of PCs, the algorithm requires access to an exact sampler and an exact counter. Since not all PCs support exact sampling and counting, we modify the technique presented in Canonne and Rubinfeld (2014) to allow for approximate samples and counts. Furthermore, we implement and test on a dataset of publicly available PCs arising from applications in circuit testing. Our results show that closeness testing can be accurate and scalable in practice.

For some fragments, such as , no sampling algorithm is known, and for fragments such as , sampling is known to be NP-hard Roth (1996). Since requires access to approximate weighted counters and samplers to achieve tractability, the question of determining the closeness of the PCs mentioned above remains unanswered. Thus, we investigate further and characterize the complexity of closeness testing for a broad range of PCs. Our characterization reveals that PCs from the fragments -s and s can be tested for closeness in poly-time via , owing to the algorithms of Darwiche (2001) and Arenas et al. (2021). We show that the approximate counting algorithm of Arenas et al. (2021) can be extended to log-linear s using chain formulas Chakraborty et al. (2015). Then, using previously known results, we also find that there are no poly-time equivalence tests for PCs from and , conditional on widely believed complexity-theoretic conjectures. Our characterization also reveals some open questions regarding the complexity of closeness and equivalence testing of PCs.

The rest of the paper is organized in the following way. We define the notation and discuss related work in Section 2. We then present the main contribution of the paper, the closeness test , and the associated proof of correctness in Section 3. We present our experimental findings in Section 4, and then discuss the complexity landscape of closeness testing in Section 5. We conclude the paper and discuss some open problems in Section 6. Due to space constraints, we defer some proofs to the supplementary Section A.

2 Background

Let be a circuit over Boolean variables. An assignment to the variables of is a satisfying assignment if . The set of all satisfying assignments of is . If , then is said to be satisfiable and if , then is said to be valid. We use to denote the size of circuit , where the size is the total number of vertices and edges in the circuit DAG.

The polynomial hierarchy (PH) contains the classes (NP) and (co-NP) along with generalizations of the form and where and  Stockmeyer (1976). The classes and are said to be at level . If it is shown that two classes on the same or consecutive levels are equal, the hierarchy collapses to that level. Such a collapse is considered unlikely, and hence is used as the basic assumption for showing hardness results, including the ones we present in the paper.

2.1 Probability distributions

A weight function assigns a positive rational weight to each assignment . We extend the definition of to also allow circuits as input: . For weight function and circuit , is the weighted model count (WMC) of w.r.t. .

In this paper, we focus on log-linear weight functions as they capture a wide class of distributions, including those arising from graphical models, conditional random fields, and skip-gram models Murphy (2012). Log-linear models are represented as literal-weighted functions, defined as:

Definition 1.

For a set of variables, a weight function is called literal-weighted if there is a poly-time computable map such that for any assignment

For all circuits , and log-linear weight functions , can be represented in size polynomial in the input.

Probabilistic circuits:

A probabilistic circuit is a satisfiable circuit along with a weight function . and together define a discrete probability distribution on the set that is supported over . We denote the p.m.f. of this distribution as:

In this paper, we study circuits that are fragments of the Negation Normal Form (). A circuit in is a rooted, directed acyclic graph (DAG), where each leaf node is labeled with true, false, or ; and each internal node is labeled with a or and can have arbitrarily many children. We focus on four fragments of , namely, Decomposable (), deterministic-(-), Structured (), and Prime Implicates(). For further information regarding circuits in , refer to the survey Darwiche and Marquis (2002) and the paper Pipatsrisawat and Darwiche (2008).

The TV distance of two probability distributions and over is defined as: .

and are said to be (1) equivalent if , (2) -close if , and (3) -far if .

Our closeness testing algorithm , assumes access to an approximate weighted counter , and an approximate weighted sampler . We define their behavior as follows:

Definition 2.

takes a circuit , a weight function , a tolerance parameter and a confidence parameter as input and returns the approximate weighted model count of w.r.t. such that

Tractable approximate counting algorithms for PCs are known as Fully Polynomial Randomised Approximation Schemes (FPRAS). The running time of an FPRAS is given by .

Definition 3.

takes a circuit , a weight function , a tolerance parameter and a confidence parameter as input and returns either (1) a satisfying assignment sampled approximately w.r.t. weight function with probability or (2) a symbol indicating failure with probability . In other words, whenever samples :

Tractable approximate sampling algorithms for PCs are known as Fully Polynomial Almost Uniform Samplers (FPAUS). The running time of an FPAUS for a single sample is given by .

In the rest of the paper denotes the set , represents the indicator variable for event , and

represents the expectation of random variable


2.2 Related work

Closeness testing:

Viewing circuit equivalence testing through the lens of distribution testing, we see that the - equivalence test of Darwiche and Huang (2002)

can be interpreted as an equivalence test for uniform distribution on the satisfying assignments of

-s. This relationship between circuit equivalence testing and closeness testing lets us rule out the existence of distributional equivalence tests for all those circuits for which circuit equivalence is already known to be hard under complexity-theoretic assumptions. We will explore this further in Section 5.2.

Distribution testing:

Discrete probability distributions are typically defined over an exponentially large number of points; hence a lot of recent algorithms research has focused on devising tests that require access to only a sublinear or even constant number of points in the distribution Canonne (2020). In this work, we work with distributions over , and thus we aim to devise algorithms with running time at most polynomial in . Previous work in testing distributions over Boolean functions has focused on the setting where the distributions offer pair-conditional sampling access Chakraborty and Meel (2019); Meel et al. (2020). Using pair-conditional sampling access, Meel et al. (2020) were able to test distributions for closeness using queries, where is the ratio of the probabilities of the most and least probable element in the support.

3 : a tractable algorithm for closeness testing

In this section, we present the main contribution of the paper: a closeness test for PCs, . The pseudocode of is given in Algorithm 1.

Given satisfiable circuits and weight functions along with parameters , decides whether the TV distance between and is lesser than or greater than with confidence at least . assumes access to an approximate weighted counter , and an approximate weighted sampler . We define their behavior in the following two definitions.

The algorithm

starts by computing constants and . Then it queries the routine with circuit and weight function to obtain a approximation of with confidence at least . A similar query is made for and to obtain an approximate value for . These values are stored in and , respectively. maintains a -sized array , to store the estimates for . now iterates times. In each iteration, it generates one sample through the call on line 7. There is a small probability of at most that this call fails and returns . only samples from one of the two PCs.

The algorithm then proceeds to compute the weight of assignment w.r.t. the weight functions and and stores it in and , respectively. Using the weights and approximate weighted counts stored in the algorithm computes the value on line 10, where is an approximation of the ratio of the probability of in the distribution to its probability in . Since was sampled from , its probability in cannot be 0, ensuring that there is no division by 0. If the ratio is less than 1, then is updated with the value otherwise the value of remains 0. After the iterations, sums up the values in the array . If the sum is found to be less than threshold , returns and otherwise returns .

6:  for  do
8:     if  then
11:        if  then
13:  if  then
14:     Return
15:  else
16:      Return
Algorithm 1

The following theorem asserts the correctness of .

Theorem 1.

Given two satisfiable probabilistic circuits and weight functions , along with parameters and ,

  1. If , then returns with probability at least .

  2. If , then returns with probability at least .

The following theorem states the running time of the algorithm,

Theorem 2.

Let , then the time complexity of is in . If the underlying PCs support approximate counting and sampling in polynomial time, then the running time of is also polynomial in terms of and .

To improve readability, we use to refer to the distribution and to refer to .

3.1 Proving the correctness of

In this subsection, we present the theoretical analysis of , and the proof of Theorem 1(A). We will defer the proofs of Theorem 1(B) and Theorem 2 to the supplementary Section A.4.2 and Section A.4.3, respectively.

For the purpose of the proof, we will first define events and . Events are defined w.r.t. the function calls and , respectively (as on lines 45 of Algorithm 1). and represent the events that the two calls correctly return approximations of the weighted model counts of and i.e. , and . From the definition of , we have .

Let denote the event that (Algorithm 1, line 7) returns the symbol in the th iteration of the loop. By the definition of we know that .

The analysis of requires that all calls and both calls return correctly. We denote this super-event as . Applying the union bound we see that the probability of all calls to and returning without error is at least :


We will now state a lemma, which we will prove in the supplementary Section A.4.

Lemma 1.

We now prove the lemma critical for our proof of correctness of .

Lemma 2.

Assuming the event , let , then

  1. If , then

  2. If , then


If  , then . Using this fact we see that,

Thus we have that . We now divide the set of assignments into three disjoint partition and as following: ; ; . The definition implies that the indicator is for all assignments in the set , and is for all assignments in . Similarly takes value and for all elements in and , respectively.

Now we bound the magnitude of ,

For , we have that , and thus:

We can split the summation into three terms based on the sets in which the assignments lie. Some summands take the value in a particular set, so we don’t include them in the term.

Since we know that and , we can alter the second and third terms of the inequality in the following way:

Using our assumption of the event and Lemma 1, Since , we get . We can now deduce that if , then and if , then . ∎

Using to test PCs in general.

Exact weighted model counting(WMC) is a commonly supported query on PCs. In the language of PC queries, a WMC query is known as the marginal () query. Conditional inference () is another well studied PC query. Using and , one can sample from the distribution encoded by a given PC. It is known that if a PC has the structural properties of smoothness and decomposability, then the and queries can be computed tractably. For the definitions of the above terms and further details, please refer to the survey Choi et al. (2020).

4 Evaluation

To evaluate the performance of , we implemented a prototype in Python. The prototype uses 111https://github.com/meelgroup/WAPS Gupta et al. (2019) as a weighted sampler to sample over the input - circuits. The primary objective of our experimental evaluation was to seek an answer to the following question: Is able to determine the closeness of a pair of probabilistic circuits by returning if the circuits are -close and if they are -far? We test our tool in the following two settings:

  1. The pair of PCs represent small randomly generated circuits and weight functions.

  2. The pair of PCs are from the set of publicly available benchmarks arising from sampling and counting tasks.

Our experiments were conducted on a high performance compute cluster with Intel Xeon(R) E5-2690 v3@2.60GHz CPU cores. For each benchmark, we use a single core with a timeout of 7200 seconds.

4.1 Setting A - Synthetic benchmarks


Our dataset for experiments conducted in setting A consisted of randomly generated 3-s and with random literal weights. Our dataset consisted of 3-s with variables. Since the circuits are small, we validate the results by computing the actual total variation distance using brute-force.

Benchmark Actual Result Expected Result
15_3 0.75 0.94 0.804 R A/R
14_2 0.8 0.9 0.764 A A
17_4 0.75 0.9 0.941 R R
14_1 0.9 0.99 0.740 A A
18_2 0.75 0.9 0.918 R R
Table 1: Runtime performance of . We experiment with 375 random PCs with known , and out of the 375 benchmarks we display 5 in the table and the rest in the supplementary Section  B. In the table ‘A’ represents and ‘R’ represents . In the last column ‘A/R’ represents that both and are acceptable outputs for .

Our tests terminated with the correct result in less than 10 seconds on all the randomly generated PCs we experimented with. We present the empirical results in Table 1. The first column indicates the benchmark’s name, the second and third indicate the parameters and on which we executed . The fourth column indicates the actual distance between the two benchmark PCs. The fifth column indicates the output of , and the sixth indicates the expected result. The full detailed results are presented in the appendix Section B.

4.2 Setting B - Real-world benchmarks


We conducted experiments on a range of publicly available benchmarks arising from sampling and counting tasks222https://zenodo.org/record/3793090. Our dataset contained 100 - circuits with weights. We have assigned random weights to literals wherever weights were not readily available. For the empirical evaluation of , we needed pairs of weighted -s with known distance. To generate such a dataset, we first chose a circuit and a weight function, and then we synthesized new weight functions using the technique of one variable perturbation, described in the appendix Section B.1.

Benchmark Result (s) Result (s)
or-70-10-8-UC-10 A 23.2 R 22.82
s641_15_7 A 33.66 R 33.51
or-50-5-4 A 414.17 R 408.59
ProjectService3 A 356.15 R 356.14
s713_15_7 A 24.86 R 24.41
or-100-10-2-UC-30 A 31.04 R 31.0
s1423a_3_2 A 153.13 R 152.81
s1423a_7_4 A 104.93 R 103.51
or-50-5-10 A 283.05 R 282.97
or-60-20-6-UC-20 A 363.32 R 362.8
Table 2: Runtime performance of . We experiment with 100 PCs with known , and out of the 100 benchmarks we display 10 in the table and the rest in the appendix  B. In the table ‘A’ represents and ‘R’ represents . The value of the closeness parameter is and the farness parameter is .

We set the closeness parameter , farness parameter and confidence for to be and , respectively. The chosen parameters imply that if the input pair of probabilistic circuits are close in , then returns with probability atleast , otherwise if the circuits are far in , the algorithm returns with probability at least . The number of samples required for (indicated by the variable as on line 2 of Algorithm 1) depends only on and for the values we have chosen, we find that we require samples.

Our tests terminated with the correct result in less than 3600 seconds on all the PCs we experimented with. We present the empirical results in Table 2. The first column indicates the benchmark’s name, the second and third indicate the result and runtime of when presented with a pair of -close PCs as input. Similarly, the fourth and fifth columns indicate the result and observed runtime of when the input PCs are -far . The full set of results are presented in the supplementary Section B.

5 A characterization of the complexity of testing

In this section, we characterize PCs according to the complexity of closeness and equivalence testing. We present the characterization in Table 3. The results presented in the table can be separated into (1) hardness results, and (2) upper bounds. The hardness results, presented in Section 5.2, are largely derived from known complexity-theoretic results. The upper bounds, presented in Section 5.1, are derived from a combination of established results, our algorithm and the exact equivalence test of Darwiche and Huang (2002)(presented in supplementary Section A.1 for completeness).

5.1 Upper bounds

In Table 3 we label the pair of classes of PCs that admit a poly-time closeness and equivalence test with green symbols and respectively. Darwiche and Huang (2002) provided an equivalence test for - s. From Theorem 1, we know that PCs that supports the and queries in poly-time must also admit a poly-time approximate equivalence test. A weighted model counting algorithms for -s was first provided by Darwiche (2001), and a weighted sampler was provided by Gupta et al. (2019).  Arenas et al. (2021) provided the first approximate counting and uniform sampling algorithm for s. Using the following lemma, we show that with the use of chain formulas, the uniform sampling and counting algorithms extend to log-linear distributions as well.

Lemma 3.

Given a formula (with a v-tree ), and a weight function , requires polynomial time in the size of .

The proof is provided in the supplementary Section A.5.

5.2 Hardness

In Table 3, we claim that the pairs of classes of PCs labeled with symbols and , cannot be tested in poly-time for closeness equivalence, respectively. Our claim assumes that the polynomial hierarchy (PH) does not collapse. To prove the hardness of testing the labeled pairs, we combine previously known facts about PCs and a few new arguments. Summarizing for brevity,

  • We start off by observing that PC families are in a hierarchy, with and  Darwiche and Marquis (2002).

  • We then reduce the problem of satisfiability testing of s (NP-hard) and validity testing of s (co-NP-hard) into the problem of equivalence and closeness testing of PCs, in Propositions 1, 2 and 3. These propositions and their proofs can be found in the supplementary Section A.5.

  • We then connect the existence of poly-time algorithms for equivalence to the collapse of PH via a complexity result due to Karp and Lipton (1980).

Table 3: Summary of results. C (resp. E) indicates that a poly-time closeness (resp. equivalence) test exists. C (resp. E) indicates that a poly-time closeness (equivalence) test exists only if PH collapses. ‘’ indicates that the existence of a poly-time test is not known. The table is best viewed in color.

The NP-hardness of deciding the equivalence of pairs of s and pairs of s was first shown by Pipatsrisawat and Darwiche (2008). We recast their proofs in the language of distribution testing for the sake of completeness in the supplementary Section A.5.

6 Conclusion and future work

In this paper, we studied the problem of closeness testing of PCs. Before our work, poly-time algorithms were known only for the special case of equivalence testing of PCs; and, no poly-time closeness test was known for any PC. We provided the first such test, called , that used ideas from the field of distribution testing to design a novel algorithm for testing the closeness of PCs. We then implemented a prototype for , and tested it on publicly available benchmarks to determine the runtime performance. Experimental results demonstrate the effectiveness of in practice.

We also characterized PCs with respect to the complexity of deciding equivalence and closeness. We combined known hardness results, reductions, and our proposed algorithm

to classify pairs of PCs according to closeness and equivalence testing complexity. Since the characterization is incomplete, as seen in Table 

3, there are questions left open regarding the existence of tests for certain PCs, which we leave for future work.

Broader Impact

Recent advances in probabilistic modeling techniques have led to increased adoption of the said techniques in safety-critical domains, thus creating a need for appropriate verification and testing methodologies. This paper seeks to take a step in this direction and focuses on testing properties of probabilistic models likely to find use in safety-critical domains. Since our guarantees are probabilistic, practical adoption of such techniques still requires careful design to handle failures. We are grateful to the anonymous reviewers of UAI 2021 and NeurIPS 2021 for their constructive feedback that greatly improved the paper. We would also like to thank Suwei Yang and Lawqueen Kanesh for their useful comments on the earlier drafts of the paper. This work was supported in part by National Research Foundation Singapore under its NRF Fellowship Programme[NRF-NRFFAI1-2019-0004 ] and AI Singapore Programme [AISG-RP-2018-005], and NUS ODPRT Grant [R-252-000-685-13]. The computational work for this article was performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg).


  • Arenas et al. [2021] Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, and Cristian Riveros. When is Approximate Counting for Conjunctive Queries Tractable? In STOC, 2021.
  • Aronsky and Haug [1998] Dominik Aronsky and Peter J Haug.

    Diagnosing community-acquired pneumonia with a bayesian network.

    In Proceedings of the AMIA Symposium, 1998.
  • Cano et al. [2004] Rafael Cano, Carmen Sordo, and José M Gutiérrez. Applications of bayesian networks in meteorology. In Advances in Bayesian networks. 2004.
  • Canonne and Rubinfeld [2014] Clément Canonne and Ronitt Rubinfeld. Testing probability distributions underlying aggregated data. In Automata, Languages, and Programming, 2014.
  • Canonne [2020] Clément L Canonne. A survey on distribution testing: Your data is big. but is it blue? Theory of Computing, 2020.
  • Canonne et al. [2020] Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Testing bayesian networks. IEEE Transactions on Information Theory, 2020.
  • Chakraborty and Meel [2019] Sourav Chakraborty and Kuldeep S. Meel. On testing of uniform samplers. In AAAI, 2019.
  • Chakraborty et al. [2015] Supratik Chakraborty, Dror Fried, Kuldeep S Meel, and Moshe Y Vardi. From weighted to unweighted model counting. In IJCAI, 2015.
  • Choi et al. [2020] YooJung Choi, Antonio Vergari, and Guy Van den Broeck. Probabilistic circuits: A unifying framework for tractable probabilistic models. 2020.
  • Chubarian and Turán [2020] Karine Chubarian and György Turán. Interpretability of bayesian network classifiers: Obdd approximation and polynomial threshold functions. In ISAIM, 2020.
  • Darwiche [2001] Adnan Darwiche. On the tractable counting of theory models and its application to truth maintenance and belief revision. Journal of Applied Non-Classical Logics, 2001.
  • Darwiche [2003] Adnan Darwiche. A differential approach to inference in bayesian networks. Journal of the ACM (JACM), 2003.
  • Darwiche and Huang [2002] Adnan Darwiche and Jinbo Huang. Testing equivalence probabilistically. In Technical Report, 2002.
  • Darwiche and Marquis [2002] Adnan Darwiche and Pierre Marquis. A knowledge compilation map. JAIR, 2002.
  • Dutta et al. [2018] Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic. Testing probabilistic programming systems. In Proc. of Joint Meeting on ESE and FSE, 2018.
  • Goodfellow et al. [2014] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks, 2014.
  • Gupta et al. [2019] Rahul Gupta, Shubham Sharma, Subhajit Roy, and Kuldeep S Meel. Waps: Weighted and projected sampling. In TACAS, 2019.
  • Karp and Lipton [1980] Richard M. Karp and Richard J. Lipton. Some connections between nonuniform and uniform complexity classes. In STOC, 1980.
  • Koller and Friedman [2009] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
  • Kulesza and Taskar [2012] Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083, 2012.
  • Lin et al. [2018] Zinan Lin, Ashish Khetan, Giulia Fanti, and Sewoong Oh. Pacgan: The power of two samples in generative adversarial networks. NeurIPS, 2018.
  • Meel et al. [2020] Kuldeep S. Meel, Yash Pote , and Sourav Chakraborty. On Testing of Samplers. In NeurIPS, 2020.
  • Murawski and Ouaknine [2005] Andrzej S. Murawski and Joël Ouaknine. On probabilistic program equivalence and refinement. In Martín Abadi and Luca de Alfaro, editors, CONCUR, 2005.
  • Murphy [2012] K.P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
  • Oniśko et al. [2000] Agnieszka Oniśko, Marek J Druzdzel, and Hanna Wasyluk. Extension of the hepar ii model to multiple-disorder diagnosis. In Intelligent Information Systems. 2000.
  • Peharz et al. [2020a] Robert Peharz, Steven Lang, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Guy Van den Broeck, Kristian Kersting, and Zoubin Ghahramani. Einsum networks: Fast and scalable learning of tractable probabilistic circuits. In ICML, 2020a.
  • Peharz et al. [2020b] Robert Peharz, Antonio Vergari, Karl Stelzner, Alejandro Molina, Xiaoting Shao, Martin Trapp, Kristian Kersting, and Zoubin Ghahramani.

    Random sum-product networks: A simple and effective approach to probabilistic deep learning.

    In UAI. PMLR, 2020b.
  • Pipatsrisawat and Darwiche [2008] Knot Pipatsrisawat and Adnan Darwiche. New compilation languages based on structured decomposability. In AAAI, 2008.
  • Pope and Lowe [2000] Arthur R Pope and David G Lowe. Probabilistic models of appearance for 3-d object recognition.

    International Journal of Computer Vision

    , 2000.
  • Radford et al. [2015] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  • Rahmatallah et al. [2014] Yasir Rahmatallah, Frank Emmert-Streib, and Galina Glazko. Gene sets net correlations analysis (gsnca): a multivariate differential coexpression test for gene sets. Bioinformatics, 2014.
  • Roth [1996] Dan Roth. On the hardness of approximate reasoning. Artificial Intelligence, 82(1-2), 1996.
  • Städler and Mukherjee [2015] Nicolas Städler and Sach Mukherjee. Multivariate gene-set testing based on graphical models. Biostatistics, 2015.
  • Stockmeyer [1976] Larry J. Stockmeyer. The polynomial-time hierarchy. Theoretical Computer Science, 1976.
  • Yin et al. [2015] Weiwei Yin, Swetha Garimalla, Alberto Moreno, Mary R Galinski, and Mark P Styczynski. A tree-like bayesian structure learning algorithm for small-sample datasets from complex biological model systems. BMC systems biology, 2015.

Appendix A Proofs omitted from the paper

a.1 A test for equivalence

For the sake of completeness we recast the - circuit equivalence test of Darwiche and Huang [2002] into an equivalence test for log-linear probability distributions.

3:  if  then
4:     Return
5:  else
6:     Return
Algorithm 2
The algorithm:

The pseudocode for is shown in Algorithm 2. takes as input two satisfiable circuits defined over Boolean variables, a pair of weight functions and a tolerance parameter . Recall that a circuit and a weight function together define the probability distribution . returns with confidence 1 if the two probability distributions and are equivalent, i.e. . If , then it returns with confidence at least .

The algorithm starts by drawing a uniform random assignment from , where . Using the procedure given in Proposition 2 (in Section