Satisfiability modulo theories (SMT) is a foundational problem in formal methods, and the research landscape is not only enjoying the success of existing SMT solvers, but also generating demand for new features. In particular, there is a growing need for model counting solvers; for example, questions in quantitative information flow and in static analysis of probabilistic programs are naturally cast as instances of model counting problems for appropriate logical theories JhaLICS ; SaxenaPLDI14 ; SCG13 .
We define the #SMT problem that generalizes several model counting questions relative to logical theories, such as computing the number of satisfying assignments to a Boolean formula (#SAT
) and computing the volume of a bounded polyhedron in a finite-dimensional real vector space. Specifically, to define model counting modulo ameasured theory, first suppose every variable in a logical formula comes with a domain which is also a measure space. Assume that, for every logical formula in the theory, the set of its models is measurable with respect to the product measure; the model counting (or #SMT) problem then asks, given , to compute the measure of , called the model count of .
In our work we focus on the model counting problems for the theories of bounded integer arithmetic and linear real arithmetic. These problems are complete for the complexity class , so fast exact algorithms are unlikely to exist.
We extend to the realm of SMT the well-known hashing approach from the world of #SAT, which reduces approximate versions of counting to decision problems. From a theoretical perspective, we solve a model counting problem with a resource-bounded algorithm that has access to an oracle for the decision problem. From a practical perspective, we show how to use unmodified existing SMT solvers to obtain approximate solutions to model-counting problems. This reduces an approximate version of #SMT to SMT.
Specifically, for integer arithmetic (not necessarily linear), we give a randomized algorithm that approximates the model count of a given formula to within a multiplicative factor for any given . The algorithm makes SMT queries of size at most where is the size of .
For linear real arithmetic, we give a randomized algorithm that approximates the model count with an additive error , where is the volume of a box containing all models of the formula, and the coefficient is part of the input. The number of steps of the algorithm and the number of SMT queries (modulo the combined theory of integer and linear real arithmetic) are again polynomial.
As an application, we show how to solve the value problem (cf. SCG13 ) for a model of loop-free probabilistic programs with nondeterminism.
Approximation of functions by randomized algorithms has a rich history in complexity theory Stockmeyer1985 ; VV86 ; JVV1986 ; jerrum1996markov . Jerrum, Valiant, and Vazirani JVV1986 described a hashing-based procedure to approximately compute any function, and noted that this procedure already appeared implicitly in previous papers by Sipser Sipser83 and Stockmeyer Stockmeyer1985
. The procedure works with encoded computations of a Turing machine and is thus unlikely to perform well in practice. Instead, we show a direct reduction from approximate model counting to SMT solving, which allows us to retain the structure of the original formula. An alternate approach could eagerly encode#SMT problems into #SAT, but experience with SMT solvers suggests that a “lazy” approach may be preferable for some problems.
For the theory of linear real arithmetic, we also need an ingredient to handle continuous domains. Dyer and Frieze DyerFrieze1988 suggested a discretization that introduces bounded additive error; this placed approximate volume computation for polytopes—or, in logical terms, approximate model counting for quantifier-free linear real arithmetic—in . Motivated by the application in the analysis of probabilistic programs, we extend this technique to handle formulas with existentially quantified variables, while Dyer and Frieze only work with quantifier-free formulas. To this end, we prove a geometric result that bounds the effect of projections: this gives us an approximate model counting procedure for existentially quantified linear arithmetic formulas. Note that applying quantifier elimination as a preprocessing step can make the resulting formula exponentially big; instead, our approach works directly on the original formula that contains existentially quantified variables.
We have implemented our algorithm on top of the Z3 SMT solver Z3 and applied it to formulas that encode the value problem for probabilistic programs. Our initial experience suggests that simple randomized algorithms using off-the-shelf SMT solvers can be reasonably effective.
Counting in SMT
#SMT is a well-known hard problem whose instances have been studied before, e. g., in volume computation DyerFrieze1988 , in enumeration of lattice points in integer polyhedra Barvinok , and as #SAT GomesSS09 . Indeed, very simple sub-problems, such as counting the number of satisfying assignments of a Boolean formula or computing the volume of a union of axis-parallel rectangles in (called Klee’s measure problem Klee ) are already -hard (see Section 2 below).
Existing techniques for #SMT either incorporate model counting primitives into propositional reasoning mlz09cade ; ZhouHXHCG14 ; BellePB15 or are based on enumerative combinatorics latte ; SaxenaPLDI14 ; JhaLICS . Typically, exact algorithms latte ; mlz09cade ; JhaLICS are exponential in the worst case, whereas approximate algorithms SaxenaPLDI14 ; ZhouHXHCG14 lack provable performance guarantees. In contrast to exact counting techniques, our procedure is easily implementable and uses “for free” the sophisticated heuristics built in off-the-shelf SMT solvers. Although the solutions it produces are not exact, they provably meet user-provided requirements on approximation quality. This is achieved by extending the hashing approach from SAT GomesHSS07 ; GomesSS09 ; ChakrabortyMV13cp ; ErmonGSS13 to the SMT context.
A famous result of Dyer, Frieze, and Kannan DyerFriezeKannan1991 states that the volume of a convex polyhedron can be approximated with a multiplicative error in probabilistic polynomial time (without the need for an SMT solver). In our application, analysis of probabilistic programs, we wish to compute the volume of a projection of a Boolean combination of polyhedra; in general, it is, of course, non-convex. Thus, we cannot apply the volume estimation algorithm of DyerFriezeKannan1991 , so we turn to the “generic” approximation of using an oracle instead. Our #SMT procedure for linear real arithmetic allows an additive error in the approximation; it is known that the volume of a polytope does not always have a small exact representation as a rational number Lawrence90 .
An alternative approach to approximate #SMT is to apply Monte Carlo methods for volume estimation. They can easily handle complicated measures for which there is limited symbolic reasoning available. Like the hashing technique, this approach is also exponential in the worst case jerrum1996markov : suppose the volume in question, , is very small and the required precision is a constant multiple of . In this case, Chernoff bound arguments would suggest the need for samples; the hashing approach, in contrast, will perform well. So, while in “regular” settings (when is non-vanishing) the Monte Carlo approach performs better, “singular” settings (when is close to zero) are better handled by the hashing approach. The two techniques, therefore, are complementary to each other (see the remark at the end of subsection 5.5).
Probably closest to our work is a series of papers by Chakraborty, Meel, Vardi et al. ChakrabortyMV13 ; ChakrabortyMV13cp ; ChakrabortyFMSV14 , who apply the hashing technique to uniformly sample satisfying assignments of SAT formulas ChakrabortyMV13 . They use CryptoMiniSat CryptoMiniSat as a practical implementation of an (SAT) oracle, as it has built-in support for XOR (addition modulo 2) constraints that are used for hashing. Their recent work ChakrabortyFMSV14 supports weighted sampling and weighted model counting, where different satisfying assignments are associated with possibly different probabilities (this can be expressed as a discrete case of #SMT). Concurrently, Ermon et al. ErmonGSS13 apply the hashing technique in the context of counting problems, relying on CryptoMiniSat as well. Ermon et al. also consider a weighted setting where the weights of satisfying assignment are given in a factorized form; for this setting, as a basic building block, they invoke an optimization solver ToulBar2 toulbar2 to answer MAP (maximum a posteriori assignment) queries. More recently and concurrently with (the conference version of) our work, Belle, Van den Broeck, and Passerini BelleUAI15 apply the techniques of Chakraborty et al. in the context of so-called weighted model integration. This is an instance of #SMT, where the weights of the satisfying assignments (models) are computed in a more complicated fashion. Belle et al. adapt the procedure of Chakraborty et al., also using CryptoMiniSat, but additionally rely on the Z3 SMT solver to check candidate models against the theory constraints (real arithmetic in this case) encoded by the propositional variables, and use the LattE tool latte for computing the volume of polyhedra.
We briefly review the problem settings of Ermon et al. ErmonGSS13 and Belle et al. BelleUAI15 ; BellePB15 in Section 2. In our work, the problem setting is more reminiscent of those in Chakraborty et al. and Ermon et al. Our implementation currently does not rely on CryptoMiniSat and uses an unmodified theory solver, Z3, instead (see subsection 5.6).
We extend, from SAT to SMT, the hashing approach to approximate model counting:
We formulate the notion of a measured theory (Section 2) that gives a unified framework for model-counting problems.
For the theory of bounded linear real arithmetic, we give a technical construction (Lemma 1 in subsection 3.2) that lets us extend the results of Dyer and Frieze to the case where the polytope is given as a projection of a Boolean combination of polytopes; this leads to an approximate model counting procedure for this theory (Theorem 2.2 in Section 2).
As an application, we solve the value problem for small loop-free probabilistic programs with nondeterminism (Section 5).
The conference version of this paper appeared as ChistikovDM15 .
2 The #Smt Problem
We present a framework for a uniform treatment of model counting both in discrete theories like SAT (where it is literally counting models) and in linear real arithmetic (where it is really volume computation for polyhedra). We then introduce the notion of approximation and give an algorithm for approximate model counting by reduction to SMT.
Preliminaries: Counting Problems and
A relation is a p-relation if (1) there exists a polynomial such that if then and (2) the predicate can be checked in deterministic polynomial time in the size of . Intuitively, a p-relation relates inputs to solutions . It is easy to see that a decision problem belongs to if there is a p-relation such that .
A counting problem is a function that maps to . A counting problem belongs to the class if there exists a p-relation such that , i. e., the class consists of functions that count the number of solutions to a p-relation Valiant1979 . Completeness in is with respect to Turing reductions; the same term is also (ab)used to encompass problems that reduce to a fixed number of queries to a function (see, e. g., DyerFrieze1988 ).
#SAT is an example of a -complete problem: it asks for the number of satisfying assignments to a Boolean formula in conjunctive normal form (CNF) Valiant1979 . Remarkably, characterizes the computational complexity not only of “discrete” problems, but also of problems involving real-valued variables: approximate volume computation (with additive error) for bounded rational polyhedra in is -complete DyerFrieze1988 .
Measured Theories and #Smt
We will now define the notion of model counting that generalizes #SAT and volume computation for polyhedra. Suppose is a logical theory. Let be a formula in this theory with free first-order variables . Assume that comes with a fixed interpretation which specifies domains of the variables, denoted , and assigns a meaning to predicates and function symbols in the signature of . Then a tuple is called a model of if the sentence holds, i. e., if . We denote the set of all models of a formula by ; the satisfiability problem for asks, for a formula given as input, whether .
Consider the special cases of #SAT and volume computation for polyhedra; the corresponding satisfiability problems are SAT
and linear programming. For#SAT, atomic predicates are of the form , for , the domain of each is , and formulas are propositional formulas in conjunctive normal form. For volume computation, atomic predicates are of the form , for , the domain of each is , and formulas are conjunctions of atomic predicates. Sets in these cases are the set of satisfying assignments and the polyhedron itself, respectively.
Suppose the domains given by the fixed interpretation are measure spaces: each is associated with a -algebra and a measure . This means, by definition, that and satisfy the following properties: contains and is closed under complement and countable unions, and is non-negative, assigns to , and is -additive.111The reader is referred to standard textbooks on probability and/or measure theory for further background; see, e.g., (textbook-probability, , Chapter 1).
In our special cases, these spaces are as follows. For #SAT, each is the set of all subsets of , and is simply the number of elements in . For volume computation, each is the set of all Borel subsets of , and is the Lebesgue measure.
Assume that each measure is -finite, that is, the domain is a countable union of measurable sets (i. e., of elements of , and so with finite measure associated with them). This condition, which holds for both special cases, implies that the Cartesian product is measurable with respect to a unique product measure , defined as follows. A set is measurable (that is, assigns a value to ) if and only if is an element of the smallest -algebra that contains all sets of the form , with for all . For all such sets, it holds that .
In our special cases, the product measure of a set is the number of elements in and the volume of , respectively.
We say that the theory is measured if for every formula in with free (first-order) variables the set is measurable. We define the model count of a formula as . Naturally, if the measures in a measured theory can assume non-integer values, the model count of a formula is not necessarily an integer. With every measured theory we associate a model counting problem, denoted : the input is a logical formula in , and the goal is to compute the value .
The #SAT and volume computation problems are just special cases as intended, since is equal to the number of satisfying assignments of a Boolean formula and to the volume of a polyhedron, respectively.
Note that one can alternatively restrict the theory to a fixed number of variables , i.e., to , where , and introduce a measure directly on ; that is, will not be a product measure. Such measures arise, for instance, when comes in a factorized form where factors span non-singleton subsets of . A toy example, with , might have
induced by the probability density function, where and are non-negative and absolutely continuous, and the normalization constant (sometimes called the partition function) is chosen in such a way that . Note that computing , given and , is itself a #SMT- (i.e., model counting) question: the associated theory has measure induced by , and the goal is to compute , where we assume that is a formula in the theory with
. (Much more sophisticated) problems of this form arise in machine learning and have been studied, e.g., by Ermon et al.ErmonGSS13 .
A different stance on model counting questions, under the name of weighted model integration (for real arithmetic), was recently suggested by Belle, Passerini, and Van den Broeck BellePB15 . Their problem setting starts with a tuple of real-valued (theory) variables and a logical formula over and over standalone propositional variables, . All theory atoms in the formula are also abstracted as (different) propositional variables, . All literals of propositional variables are annotated with weight functions , which (can) depend on . Take any total assignment to that satisfies the propositional abstraction of and let be the set of all satisfied literals. The weight of this assignment to is the integral taken over the area restricted in by the conjunction of atoms that are associated with literals . The weighted model integral of is then the sum of weights of all assignments (to ) that satisfy the propositional abstraction of .
We discuss several other model counting problems in the following subsection.
Approximate Model Counting
We now introduce approximate #SMT and show how approximate #SMT reduces to SMT. We need some standard definitions. For our purposes, a randomized algorithm is an algorithm that uses internal coin-tossing. We always assume, whenever we use the term, that, for each possible input to , the overall probability, over the internal coin tosses , that outputs a wrong answer is at most . (This error probability can be reduced to any smaller , by taking the median across independent runs of .)
We say that a randomized algorithm approximates a real-valued functional problem with an additive error if takes as input an and a rational number and produces an output such that
where is some specific and efficiently computable upper bound on the absolute value of , i. e., , that comes with the problem . Similarly, approximates a (possibly real-valued) functional problem with a multiplicative error if takes as input an and a rational number and produces an output such that
The computation time is usually considered relative to or , respectively (note the inverse of the admissible error). Polynomial-time algorithms that achieve approximations with a multiplicative error are also known as fully polynomial-time randomized approximation schemes (FPRAS) JVV1986 .
Algorithms can be equipped with oracles solving auxiliary problems, with the intuition that an external solver (say, for SAT) is invoked. In theoretical considerations, the definition of the running time of such an algorithm takes into account the preparation of queries to the oracle (just as any other computation), but not the answer to a query—it is returned within a single time step. Oracles may be defined as solving some specific problems (say, SAT) as well as any problems from a class (say, from ). The following result is well-known.
Proposition 1 (generic approximate counting Jvv1986 ; Stockmeyer1985 )
Let be any member of . There exists a polynomial-time randomized algorithm which, using an -oracle, approximates with a multiplicative error.
In the rest of this section, we present our results on the complexity of model counting problems, , for measured theories. For these problems, we develop randomized polynomial-time approximation algorithms equipped with oracles, in the flavour of Proposition 1. We describe the proof ideas in Section 3, and details are provided in Appendix. We formally relate model counting and the value problem for probabilistic programs in Section 5; in the implementation, we substitute an appropriate solver for the theory oracle. We illustrate our approach on an example in Section 4.
By we denote the bounded version of integer arithmetic: each free variable of a formula comes with a bounded domain , where . We use the counting measure , so the model count of a formula is the number of its models. In the formulas, we allow existential (but not universal) quantifiers at the top level. The model counting problem for is -complete.
Consider the formula
in the measured theory . This formula has one free variable and one existentially quantified variable , let’s say both with domain . It is easy to see that there exist only two values of , , for which there exists a with : these are the integers and . Hence, . ∎
The model counting problem for can be approximated with a multiplicative error by a polynomial-time randomized algorithm that has oracle access to satisfiability of formulas in .
Linear real arithmetic.
By we denote the bounded version of linear real arithmetic, with possible existential (but not universal) quantifiers at the top level. Each free variable of a formula comes with a bounded domain , where . The associated measure is the standard Lebesgue measure, and the model count of a formula is the volume of its set of models. (Since we consider linear constraints, any quantifier-free formula defines a finite union of polytopes. It is an easy geometric fact that its projection on a set of variables will again be a finite union of bounded polytopes. Thus, existential quantification involves only finite unions.)
Consider the same formula
this time in the measured theory , where and . Note that now is equivalent to , and thus : this is the length of the line segment defined by this constraint. ∎
We denote the combined theory of (bounded) integer arithmetic and linear real arithmetic by . In the model counting problem for , the a priori upper bound on the solution is ; additive approximation of the problem is -complete.
The model counting problem for can be approximated with an additive error by a polynomial-time randomized algorithm that has oracle access to satisfiability of formulas in .
3 Proof Techniques
3.1 Approximate discrete model counting
We now explain the idea behind Theorem 2.1. Let be an input formula in and let be the free variables of . Suppose is a big enough integer such that all models of have components not exceeding , i. e., .
Our approach to approximating follows the construction in Jerrum et al. JVV1986 , which builds upon the following observation. Suppose our goal is to find a value such that , and we have an oracle , for “Estimate”, answering questions of the form . Then it is sufficient to make such queries to for , , and the overall algorithm design is reduced to implementing such an oracle efficiently.
It turns out that such an implementation can be done with the help of hashing. Suppose that a hash function , taken at random from some family , maps elements of to . If the family is chosen appropriately, then each potential model is mapped by to, say, with probability ; moreover, one should expect that any set of size has roughly elements in . In other words, if , then is non-empty with high probability, and if , then is empty with high probability. Distinguishing between empty and non-empty sets is, in turn, a satisfiability question and, as such, can be entrusted to the solver. As a result, we reduced the approximation of the model count of to a series of satisfiability questions in .
Our algorithm posts these questions as SMT queries of the form
where and are tuples of integer variables, each component of is either or , the formula says that is binary encoding of , and the formula encodes the computation of the hash function on input .
Algorithm 1 is the basis of our implementation. It returns a value that satisfies the inequalities with probability at least . Algorithm 1 uses a set of parameters to discharge small values by enumeration in the SMT solver (parameters ) and to query the solver for larger instances (parameters ). The procedure given as Algorithm 2 asks the SMT solver for to produce models (for a positive integer parameter ) to formulas of the form (1) by calling the procedure SMT.
To achieve the required precision with the desired probability, the algorithm constructs a conjunction of copies of the formula (over disjoint sets of variables), where the number of copies is defined as
This results in a formula with
binary variables, wheredenotes the size of the original formula . Then, in lines 1–1, Algorithm 1 performs for each dimension of the hash function in the range a majority vote over calls to the procedure , where the values of and are computed as follows:
In a practical implementation, early termination of the majority-vote loop is possible as soon as the number of positive answers given by exceeds .
For formulas with up to models, Algorithm 1 returns the exact model count (line 1 in Algorithm 1) computed by the procedure SMT, which repeatedly calls the solver, counting the number of models up to .
The family of hash functions used by pick-hash in Algorithm 2 needs to satisfy the condition of pairwise independence: for any two distinct vectors and any two strings , the probability that a random function satisfies and is equal to . There are several constructions for pairwise independent hash functions; we employ a commonly used family, that of random XOR constraints VV86 ; BGP00 ; GomesSS09 ; ChakrabortyMV13 . Given and , the family contains (in binary encoding) all functions with , where for all and is the XOR operator (addition in ). By randomly choosing the coefficients we get a random hash function from this family. The size of each query is thus bounded by , where is again the size of the original formula , and there will be at most queries in total.
Note that the entire argument remains valid even if has existentially quantified variables: queries (1) retain them as is. The prefix of existential quantifiers could simply be dropped from (1), as searching for models of quantifier-free formulas already captures existential quantification. It is important, though, that the model enumeration done by the procedure SMT in Algorithms 1 and 2 only count distinct assignments to the free variables of and respectively.
3.2 Approximate continuous model counting
In this subsection we explain the idea behind Theorem 2.2. Let be a formula in ; using appropriate scaling, we can assume without loss of generality that all its variables share the same domain. Suppose and fix some , with the prospect of finding a value that is at most away from (we take as the value of the upper bound in the definition of additive approximation). We show below how to reduce this task of approximate continuous model counting to additive approximation of a model counting problem for a formula with a discrete set of possible models, which, in turn, will be reduced to that of multiplicative approximation.
We first show how to reduce our continuous problem to a discrete one. Divide the cube into small cubes with side each, . For every , set if at least one point of the cube satisfies ; that is, if .
Imagine that we have a formula such that for all , and let be written in a theory with a uniform measure that assigns “weight” to each point ; one can think of these weights as coefficients in numerical integration. From the technique of Dyer and Frieze (DyerFrieze1988, , Theorem 2) it follows that for a quantifier-free and an appropriate value of the inequality holds.
Indeed, Dyer and Frieze prove a statement of this form in the context of volume computation of a polyhedron, defined by a system of inequalities . However, they actually show a stronger statement: given a collection of hyperplanes in and a set , an appropriate setting of will ensure that out of cubes with side only a small number will be cut, i. e., intersected by some hyperplane. More precisely, if , then this number will satisfy the inequality . Thus, the total volume of cut cubes is at most , and so, in our terms, we have as desired.
However, in our case the formula need not be quantifier-free and may contain existential quantifiers at the top level. If where is quantifier-free, then the constraints that can “cut” the -cubes are not necessarily inequalities from . These constraints can rather arise from projections of constraints on variables and, what makes the problem more difficult, their combinations. However, we are able to prove the following statement:
The number of points for which cubes are cut satisfies if , where and is the number of atomic predicates in .
Observe that a cube is cut if and only if it is intersected by a hyperplane defined by some predicate in variables . Such a predicate does not necessarily come from the formula itself, but can arise when a polytope in variables is projected to the space associated with variables . Put differently, each cut cube has some -dimensional face with that “cuts” it; this face is an intersection of with some affine subspace in variables .
Consider this subspace . It can be, first, the projection of a hyperplane defined in variables by an atomic predicate in or, second, the projection of an intersection of several such hyperplanes. Now note that each predicate in defines exactly one hyperplane; an intersection of hyperplanes in projects to some specific affine subspace in variables . Therefore, each “cutting” affine subspace is associated with a distinct subset of atomic predicates in , where, since the domain is bounded, we count in constraints as well. This gives us at most cutting subspaces, so it remains to apply the result of Dyer and Frieze with . ∎
A consequence of the lemma is that the choice of the number ensures that the formula written in the combined theory satisfies the inequality . Here we associate the domain of each free variable with the uniform measure . Note that the value of chosen in Lemma 1 will still keep the number of steps of our algorithm polynomial in the size of the input, because the number of bits needed to store the integer index along each axis is and not itself.
As a result, it remains to approximate with additive error of at most , which can be done by invoking the procedure from Theorem 2.1 that delivers approximation with multiplicative error .
4 A Fully Worked-Out Example
. Probabilistic programs are a means of describing probability distributions; the model we use combines probabilistic assignments and nondeterministic choice, making programs more expressive, but analysis problems more difficult.
For this section we choose a relatively high level of presentation in order to convey the main ideas in a more understandable way; a formal treatment follows in Section 5, where we discuss (our model of) probabilistic programs and their analysis in detail.
We describe our approach using as an example the following classic problem from probability theory. Imagine a television game show with two characters: the player and the host. The player is facing three doors, numbered, , and ; behind one of these there is a car, and behind the other two there are goats. The player initially picks one of the doors, say door , but does not open it. The host, who knows the position of the car, then opens another door, say door with , and shows a goat behind it. The player then gets to open one of the remaining doors. There are two available strategies: stay with the original choice, door , or switch to the remaining alternative, door . The Monty Hall problem asks, which strategy is better? It is widely known that, in the standard probabilistic setting of the problem, the switching strategy is the better one: it has payoff , i. e., it chooses the door with the car with probability ; the staying strategy has payoff of only .
Modeling with a probabilistic program
We model the setting of the Monty Hall problem with the probabilistic program in Procedure LABEL:a-proc:mh, which implements the “switch” strategy. algocf[nofillcomment] In this problem, there are several kinds of uncertainty and choice, so we briefly explain how they are expressed with the features of our programming model.
First, there is uncertainty in what door hides the car and what door the player initially picks. It is standard to model the initial position of the car,
, by a random variable distributed uniformly on; we simply follow the information-theoretic guidelines here. At the same time, due to the symmetry of the setting we can safely assume that the player always picks door at first, so here choice is modeled by a deterministic assignment.
Second, there is uncertainty in what door the host opens. We model this with nondeterministic choice. Since the host knows that the car is behind door and does not open door accordingly, we restrict this choice by stipulating that . For the semantics of the program, this means that for different outcomes of the probabilistic assignment different sets of paths through the program are available (some paths are excluded, because they are incompatible with the results of observations stipulated by statements).
Note that we don’t know the nature of the host’s choice in the case that more than one option is available (when , either element of can be chosen as ). In principle, this choice may be cooperative (the host helps the player to win the car), adversarial (the host wants to prevent the player from winning), probabilistic (the host tosses a coin), or any other. In our example, the cooperative and the adversarial behavior of the host are identical, so our model is compatible with either of them.
Finally, uncertainty in the final choice of the player is modeled by fixing a specific behaviour of the player and declaring acceptance if the result is successful. Our procedure implements the “switching” strategy; that is, the player always switches from door . The analysis of the program will show how good the strategy is.
Semantics and value of the program
Informally, consider all possible outcomes of the probabilistic assignments. Restrict attention to those that may result in the program reaching (nondeterministically) at least one of or statements—such elementary outcomes form the set (for “termination”); only these scenarios are compatible with the observations. Similarly, some of these outcomes may result in the program reaching (again, nondeterministically) an statement—they form the set ; the interpretation is that for these scenarios the strategy is successful.
These sets and are events in a probability space. The value of the program (in this case interpreted as the payoff of the player’s strategy) is the probability of acceptance conditioned on termination:
where, in general, we assume and the last equality follows because . In general, this semantics corresponds to the cooperative behavior of the host, but in our case the adversarial behavior would be identical: there is no value of such that one nondeterministic choice leads to and another leads to . (We can also deal with adversarial nondeterminism, see subsection 5.3.) Indeed, for there are two paths to , and for each of there is a single path to . As a result, , as intended.
Reduction of value estimation to model counting
To estimate the value of the program, we first reduce its computation to a model counting problem (as defined in Section 2) for an appropriate logical theory. We write down the verification condition that defines a valid computation of the program, by asserting a relation between (values of) nondeterministic and probabilistic variables and . Then we construct existential formulas of the form
which assert that the program terminates with “accept” (resp. “accept” or “reject”), and whose sets of models (i. e., satisfying assignments) are exactly the sets and defined above. For the Monty Hall program, these formulas and , with , will be equivalent to and , respectively. The value of the program is the ratio , where denotes the model count of a formula, as in Section 2. Technically, we can use , the theory of integer arithmetic, with the domain for the free variable and with the counting measure , also following Section 2. So in our example, and .
Computing the value of the program
We show how our method (see subsection 3.1) estimates . We make several copies of the variable , denoted . The formula
has models, and we can estimate by estimating and taking the th root of the estimate. Enlarging to and then taking the th root increases precision: for example, if the approximation procedure gives a result up to a factor of , the th root of the estimate for gives an approximation for up to a factor of .
Now observe that for a hash function with values in , taken at random from an appropriate family, the expected model count of the formula
is . By a Chernoff bound argument, the model count is concentrated around the expectation. Our algorithm will, for increasing values of , sample random hash functions from an appropriate class, construct the formula (2), and give the formula to an SMT solver to check satisfiability. (Note that such formulas are purely existential—in variables as well as in copies of .) With high probability, the first for which the sampled formula is unsatisfiable will give a good enough estimate of and, by the reduction above, of .
Let us give some concrete values to support the intuition. We encode the number in binary, as . We make copies, and this will ensure that we will obtain the exact value of by taking th root of , where is as above (for exact rather than approximate solution, a multiplicative gap of less than suffices in our setting). In reality, and so , but we only know a priori that and . We iterate over the dimension of the hash function and perform the SMT query (2) for each . Using standard statistical techniques, we can reduce the error probability by repeating each random experiment a sufficiently large number of times, ; in our case leads to . A typical run of our implementation is demonstrated in Table 1; for each we show how many of the sampled formulas are satisfiable, and how many are not. The “Majority vote” column is used by our procedure to decide if the number of models is more than times a constant factor. From the table, our procedure will conclude that is between and with probability at least (see Appendix A for derivation of the constants and ). This gives us the interval for ; since is integer, we conclude that with probability at least .
As mentioned above, the same technique will deliver us and hence, .
5 Value Estimation for Probabilistic Programs
In this section we show how our approach to #SMT applies to the value problem for probabilistic programs.
What are probabilistic programs?
Probabilistic models such as Bayesian networks, Markov chains, probabilistic guarded-command languages, and Markov decision processes have a rich history and form the modeling basis in many different domains (see, e.g.,FV97 ; McIverMorgan ; Darwiche ; KollerFriedman ). More recently, there has been a move toward integrating probabilistic modeling with “usual” programming languages GTS94 ; InferNET . Semantics and abstract interpretation for probabilistic programs with angelic and demonic non-determinism has been studied before Kozen81 ; McIverMorgan ; Monniaux2005 ; CousotMonerau12 , and we base our semantics on these works.
Probabilistic programming models extend “usual” nondeterministic programs with the ability to sample values from a distribution and condition the behavior of the programs based on observations GHNR14 . Intuitively, probabilistic programs extend an imperative programming language like C with two constructs: a nondeterministic assignment to a variable from a range of values, and a probabilistic assignment that sets a variable to a random value sampled from a distribution. Designed as a modeling framework, probabilistic programs are typically treated as descriptions of probability distributions and not meant to be implemented and executed as usual programs.
We consider a core loop-free imperative language extended with probabilistic statements, similarly to SCG13 , and with nondeterministic choice. Under each given assignment to the probabilistic variables, a program accepts (rejects) if there is an execution path that is compatible with the observations and goes from the initial vertex to the accepting (resp., rejecting) vertex of its control flow automaton. Consider all possible outcomes of the probabilistic assignments in a program . Restrict attention to those that result in reaching (nondeterministically) at least one of the accepting or rejecting vertices—such elementary outcomes form the set (for “termination”); only these scenarios are compatible with the observations. Similarly, some of these outcomes may result in the program reaching (again, nondeterministically) the accepting vertex—they form the set . Note that the sets and are events in a probability space; define , the value of , as the conditional probability , which is equal to the ratio as . We assume that programs are well-formed in that is bounded away from .
Now consider a probabilistic program over a measured theory , i. e., where the expressions and predicates come from . Associate a separate variable with each probabilistic assignment in and denote the corresponding distribution by . Let be the set of all such variables .
There exists a polynomial-time algorithm that, given a program over , constructs logical formulas and over such that and , where each free variable is interpreted over its domain with measure . Thus, .
Proposition 2 reduces the value problem—i. e., the problem of computing —to model counting. This enables us to characterize the complexity of the value problem and solve this problem approximately using the hashing approach from Section 3. These results appear as Theorem 5.2 in subsection 5.5 below.
In the remainder of this section we define the syntax (subsection 5.1) and semantics (subsection 5.2) of our programs and the value problem. By reducing this problem to #SMT (subsection 5.5) we show an application of our approach to approximate model counting (an experimental evaluation is provided in subsection 5.6). We also discuss modeling different kinds of nondeterminism: cooperative and adversarial (subsection 5.3), and give an short overview of known probabilistic models subsumed by ours (subsection 5.4).
A program has a set of variables , partitioned into Boolean, integer, and real-valued variables. We assume expressions are type correct, i.e., there are no conversions between variables of different types. The basic statements of a program are:
deterministic assignments ,
probabilistic assignments ,
assume statements ,
where and come from an (unspecified) language of expressions and predicates, respectively.
The (deterministic) assignment and assume statements have the usual meaning: the deterministic assignment sets the value of the variable to the value of the expression on the right-hand side, and continues execution only if the predicate is satisfied in the current state (i.e., it models observations used to condition a distribution). The probabilistic assignment operation
samples the uniform distribution over the rangewith constant parameters and assigns the resulting value to the variable . For example, for a real variable , the statement draws a value uniformly at random from the segment , and for an integer variable , the statement sets to or with equal probability.
The control flow of a program is represented using directed acyclic graphs, called control flow automata (CFA), whose nodes represent program locations and whose edges are labeled with program statements. Let denote the set of basic statements; then a control flow automaton (CFA) consists of a set of variables , a labeled, directed, acyclic graph , with , and three designated vertices , , and in called the initial, accepting, and rejecting vertices.
Figure 1 depicts the CFA for the probabilistic program shown in Procedure LABEL:a-proc:mh. The and statements from the procedure correspond to the and vertices of the CFA respectively.
We assume has no incoming edges and and have no outgoing edges. We write if . We also assume programs are in static single assignment (SSA) form, that is, each variable is assigned at most once along any execution path. A program can be converted to SSA form using standard techniques Muchnick ; HNRS14 .
Since control flow automata are acyclic, our programs do not have looping constructs. Loops can be accommodated in two different ways: by assuming that the user provides loop invariants KatoenMMM10 , or by assuming an outer (statistical) procedure that selects a finite set of executions that is sufficient for the analysis up to a given confidence level SCG13 ; SPMMGC14 . In either case, the core analysis problem reduces to analyzing finite-path unwindings of programs with loops, which is exactly what our model captures.
Although our syntax only allows uniform distributions, we can model some other distributions. For example, to simulate a Bernoulli random variable that takes value with probability and with probability , we write the following code:
We can similarly encode uniform distributions with non-constant boundaries as well as (approximately encode) normal distributions (using repeated samples from uniform distributions and the central limit theorem).
The semantics of a probabilistic program is given as a superposition of nondeterministic programs, following Kozen81 ; CousotMonerau12 . Intuitively, when a probabilistic program runs, an oracle makes all random choices faced by the program along its execution up front. With these choices, the program reduces to a usual nondeterministic program.
We first provide some intuition behind our semantics. Let us partition the variables of a program into random variables (those assigned in a probabilistic assignment) and nondeterministic variables (the rest). (The partition is possible because programs are in static single assignment form.) We consider two events. The (normal) termination event (resp. the acceptance event) states that under a scenario for the random variables in , there is an assignment to the variables in such that the program execution under this choice of values reaches or (resp. reaches ). The termination is “normal” in that all s are satisfied. Our semantics computes the conditional probability, under all scenarios, of the acceptance event given that the termination event occurred.
We now formalize the semantics. A state of a program is a pair of a control node and a type-preserving assignment of values to all program variables in . Let denote the set of all states and the set of finite sequences over .
Let be the probability space associated with probabilistic assignments in a program ; elements of will be called scenarios. The probabilistic semantics of , denoted , is a function from to , mapping each scenario to a collection of maximal executions of the nondeterministic program obtained by fixing . It is defined with the help of an extension of from programs to states, which, in turn, is defined inductively as follows:
and for all ;
if and ;