Soft Constraints for Inference with Declarative Knowledge

01/16/2019
by   Zenna Tavares, et al.
2

We develop a likelihood free inference procedure for conditioning a probabilistic model on a predicate. A predicate is a Boolean valued function which expresses a yes/no question about a domain. Our contribution, which we call predicate exchange, constructs a softened predicate which takes value in the unit interval [0, 1] as opposed to a simply true or false. Intuitively, 1 corresponds to true, and a high value (such as 0.999) corresponds to "nearly true" as determined by a distance metric. We define Boolean algebra for soft predicates, such that they can be negated, conjoined and disjoined arbitrarily. A softened predicate can serve as a tractable proxy to a likelihood function for approximate posterior inference. However, to target exact inference, we temper the relaxation by a temperature parameter, and add a accept/reject phase use to replica exchange Markov Chain Mont Carlo, which exchanges states between a sequence of models conditioned on predicates at varying temperatures. We describe a lightweight implementation of predicate exchange that it provides a language independent layer that can be implemented on top of existingn modeling formalisms.

READ FULL TEXT VIEW PDF

page 3

page 5

05/10/2021

Warped Gradient-Enhanced Gaussian Process Surrogate Models for Inference with Intractable Likelihoods

Markov chain Monte Carlo methods for intractable likelihoods, such as th...
02/12/2021

Sequential Neural Posterior and Likelihood Approximation

We introduce the sequential neural posterior and likelihood approximatio...
10/13/2020

Error-guided likelihood-free MCMC

This work presents a novel posterior inference method for models with in...
11/26/2013

On the Complexity and Approximation of Binary Evidence in Lifted Inference

Lifted inference algorithms exploit symmetries in probabilistic models t...
07/27/2017

Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models

Lifted inference algorithms commonly exploit symmetries in a probabilist...
02/11/2022

Inference and FDR Control for Simulated Ising Models in High-dimension

This paper studies the consistency and statistical inference of simulate...
01/23/2018

Tractable Learning and Inference for Large-Scale Probabilistic Boolean Networks

Probabilistic Boolean Networks (PBNs) have been previously proposed so a...

1 Introduction

Conditioning in Bayesian inference incorporates observed data into a model. In a broader sense, conditioning revises a model such that a yes/no question (a predicate) is resolved to a true proposition (a fact). For instance, the question of whether a variable is equal to a particular value, changes from a predicate of uncertain truth, to a fact, once it is observed. In principle, a predicate can be used to declare any fact about a domain, not only the observation of data. In practice, sampling from models conditioned on most predicates presents severe challenges to existing inference procedures.

Predicates can be used to update a model to adhere to known facts about a domain, without the burden of specifying how to revise the model. For example, in inverse graphics (Marschner & Greenberg, 1998; Kulkarni et al., 2015)

(inferring three dimensional geometry from observed images), the proposition “rigid bodies do not intersect” is a predicate on latent configurations of geometry. To manually revise a model to constructively adhere to this fact is ranges between inconvenient and infeasible. Instead, we would ideally simply condition on it being true, concentrating probability mass on physically plausible geometric configurations, ultimately to yield more accurate posterior inferences in the inverse graphics problem.

Predicates can also express observations that are more abstract than variables in a model. In diabetes research for example, probabilistic models have been used to relate physiological factors to glucose levels over time (Levine et al., 2017; Murata et al., 2004). Rather than concrete, numerical glucose measurements, a medical practitioner may observe (or be told) that a patient suffers from recurrent hypoglycemia, i.e., that their glucose levels periodically fall below a critical value. Even if the occurrence of hypoglycemia does not appear as an explicit variable in the model, it could be constructed as a predicate on glucose levels, and conditioned on to infer the posterior distribution over latent physiological factors.

Several effective sampling (Andrieu et al., 2003) and variational (Jordan et al., 1999; Ranganath et al., 2014)

approaches to inference require only a black-box likelihood function, i.e., one evaluable on arbitrary input. The likelihood function quantifies the extent to which values of latent variables are consistent with observations. However, most models conditioned on most predicates have likelihood functions that are intractable to compute or unknown. For example, conditioning random variables that are deterministic transformations of other random variables (e.g., the presence of hypoglycemia in the example above, or the mean of a collection of variables) often results in likelihoods that are normalized by intractable integrals. In other cases, the likelihood function is implicit to a generative process, rather than explicitly specified, and hence unavailable even when the condition is a conventional observation.

In this paper we present predicate exchange: a likelihood-free method to sample from distributions conditioned on predicates from a broad class. It is composed of two parts:

  1. Predicate Relaxation transforms a predicate such that it returns a value in a soft Boolean algebra: the unit interval with continuous logical connectives . and .

  2. Replica Exchange simulates several Markov chains of a model at different temperatures. Temperature is a parameter of predicate relaxation which controls the amount of approximation it introduces. We adapt standard replica exchange to draw samples that are asymptotically exact from the unrelaxed model.

By returning a value in instead of , a soft predicate quantifies the extent to which values of latent variables are consistent with the predicate. This allows it to serve a role similar to a likelihood function, and opens up the use of likelihood-based inference procedures. Orthogonally, we embed in a Boolean algebra to support the expression of domain knowledge of composite Boolean structure. Continuing the previous example, we may know that a person does not have hypoglycemia, or that they have hypoglycemia or hyperglycemia, or neither.

Predicate exchange is motivated by probabilistic programming languages, which have vastly expanded the class of probabilistic models that can be expressed,, but still heavily restrict the kinds of predicates that can be conditioned on. Rather than introduce a new language or modeling formalism, we mirror (Wingate et al., 2011) and provide a light-weight implementation that performs inference by modulating the execution of a stochastic simulation based model. This means predicate exchange is easily incorporated into most frameworks.

Our approach comes with certain limitations. Equality conditions on continuous variables indicate sets of zero measure. This is problematic because the probability of proposing a satisfying state in a Markov chain becomes zero. In these cases predicate exchange must sample at a minimum temperature strictly greater than zero, which is approximate. Another limitation occurs if a predicate has branches (e.g., if-then-else statements) which depend on uncertainty in the model.

In summary we address the problem of conditioning probabilistic models on predicates as a means to express declarative knowledge. In detail, we:

  1. Formalize simulation based probabilistic models in measure theoretic probability, and conditioning as the imposition of constraints expressed as predicates (Section 3).

  2. Motivate predicate relaxation (Section 4.1), and provide a complete soft Boolean algebra.

  3. Provide a light-weight implementation of predicate exchange (Section 5) through nonstandard execution of a simulation based model.

  4. Evaluate our approach on examples, including a case study in glycemic forecasting.

2 Related Work

Demand for likelihood-free inference emerged in genetics ecology. Tavaré et al. (1997) compared summary statistics of the output of a simulation with that of observed data, and rejected mismatches. Weiss et al. (1998) expanded on this with a tolerance term, so that simulations yielding data sufficiently close to the targets were accepted. Approximate Bayesian Computation (ABC) has come to refer to broad class of methods (Beaumont et al., 2002; Sisson et al., 2007) in this general regime. Marjoram et al. (2003) simulated Markov Chains according to the prior, but introduced the accept/reject stage to yield approximate posterior samples. A small tolerance leads to a high rejection rate, whereas a large tolerance results in an unacceptable approximation error. Among several solutions are dynamically decreasing the tolerance (Toni et al., 2008), importance reweighting samples based on distance (Wegmann et al., 2009), adapting the tolerance based on distance (Del Moral et al., 2012; Lenormand et al., 2013), as well as annealing the tolerance as a temperature parameter (Albert et al., 2015).

Predicate exchange targets simulation models and uses distance metrics, but targets exact inference without summary statistics. A recent approach (Graham et al., 2017) with similar objectives develops a Hamiltonian Monte Carlo variant, using a quasi-Newton method during leap-frog integration to exactly solve the observation constraint. This is limited to differentiable models conditioned with equality.

Probabilistic logics such as ProbLog (Richardson & Domingos, 2006) and Markov logic networks (De Raedt et al., 2007) allow extend first order logic to declare both models and conditions. More recent probabilistic programming systems (Milch et al., 2007; Wood et al., 2014; Mansinghka et al., 2014; Goodman et al., 2008; Carpenter et al., 2017) have focused on stochastic simulation, and automatically automatically derive the likelihood function for a rich class of models.

Several continuous (Levin, 2000) and fuzzy (Klir & Yuan, 1995) logics apply model-theoretic tools to metric structures. Continuous logics replace the Boolean structure , quantifiers and , and logical connectives with continuous counter-parts. Predicate relies uses a continuous logic only make inference more tractable. Semantically, our approach remains within measure theoretic foundations, which relies on hard predicates to condition.

3 Simulation Models

Probabilistic simulation based models specify the step-by-step causal mechanisms of a domain, and use probability distributions for any uncertain parameters. A simulation model can be stochastically executed, using a random number generator to sample from primitive random variables in the model. Inference means to simulate the model while imposing constraints on variables in the model. This is difficult, since simulation based models lack an explicit likelihood function, which is necessary for most inference procedures.

Conditioning on predicates requires a measure-theoretic foundation, in which a simulation model is a random variable:

Figure 1: Sample from geometric prior (left), whereas (right) is conditioned on no-intersection constraint

Random Variables.

Probability models lie on top of probability spaces. A probability space is a measure space , where is a sigma algebra and (Çınlar, 2011). Random variables are functions from the space to a realization space . As a concrete example the space can be thought of as a hypercube, with being uniform over that hypercube. To build a normal random variable, we need a function that maps from

. If the underlying probability space is uniform, then this function is the inverse cumulative distribution function of the normal.

A model is a collection of random variables along with a probability space.

Conditioning

Conditioning a model creates a new model. As an example consider a model with two random variables and that both take real values. Conditioning on , defines a new model based on limiting the measure space to the set . The new model is defined on a new probability space

(1)

with the same random variables and . Sampling from produces samples only where

More generally, conditioning on any predicate defines a new model defined exactly as above, where . Sampling from generates where is true.

The general construction of new models might require conditioning on sets of measure zero. This process can be made rigorous via disintegration (Chang & Pollard, 1997)

. Disintegration can be thought of as the reversal of building joint distributions through product measure constructions.

4 Predicate Exchange

To condition a model on a predicate we develop predicate exchange, a likelihood-free inference procedure. It is composed of two parts:

  1. Predicate Relaxation constructs a soft predicate from . takes values in a soft Boolean algebra: the unit interval with continuous logical connectives . and . is 1 iff is 1, but otherwise takes nonzero values denoting the degree to which is satisfied.

  2. Replica Exchange is a Markov Chain Monte Carlo procedure that exploits temperature. The strength by which relaxes is modulated by a temperature parameter , which trades off between accuracy and ease of inference. By simulating several replicas of at different temperatures, replica exchange is able to draw exact samples.

4.1 Predicate Relaxation

A soft predicate approximates in the sense that when viewed as a likelihood function on model parameters, has a broader support, assigning nonzero weights to parameter values which have zero weight under . There are three desiderata which govern this approximation. First, should have a temperature parameter that controls the fidelity of the approximation. In particular, should converge to as , and to a flat surface as . Second, the fidelity of the approximation should vary monotonically with temperature. Third, should be consistent with on 1. That is iff at all temperatures.

Definition 1.

A function parameterized by is a relaxation of if:

  1. [label=()]

  2. For all , .

  3. For all , .

  4. For all , iff .

  5. The entropy (which characterizes the fidelity of the approximation ) is an increasing function of .111By compactness, it is integrable for all , when has finite dimension

Graded Satisfiability

represents the degree to which a model realization satisfies a predicate. Let be a kernel (described below), a distance metric, and the satisfying set. is then:

(2)

where .

Distance

A relaxation kernel bounds distances from to the unit interval, and is paramterized by temperature . We restrict our attention to the squared exponential kernel:

(3)

is parameterized by the type of input. For canonical spaces such as and we default to the Euclidean distance. is then defined as . For composite elements of product type, by default takes a mean .

Composition

We construct from compositionally, by substituting primitive predicates (equality, inequalities and logical operators) with soft counterparts. For instance the predicate is transformed into . In general, we use to denote a relaxation of a predicate .

Figure 2: Soft Primitive Predicates

A soft inequality such as is function of the amount by which must be increased (or decreased) until is true. This is the distance between and the interval , where the distance between a point and any interval is the smallest distance between and any element in , and therefore 0 if :

(4)

Soft negation introduces complications. To illustrate, Figure 3 (a) shows as a function of . In continuous logics (Kimmig et al., 2012), the negation of is . However, as shown in Figure 3 (b), this violates criteria (iii) of predicate relaxation; there are values which satisfy the hard predicate which do take a value of 1 in .

Figure 3: Soft predicates as function of . In all figures the blue line denotes the soft predicate, while the red line denotes the predicate to approximate.

The problem of negation arises because is consistent with at 1 but not at 0. In other words, is a one-sided approximation. To overcome this challenge, soft primitives yield a pair where . preserves consistency with on , just as before, while preserves consistency with on . For example if , then as a function of , and correspond to Figure 3 (a) and (c) respectively.

A complete two-sided soft logic is shown in Figure 4. Although a two-sided predicate has two components, for the sake of conditioning we are still concerned only with the true side in the pair . Soft negation simply swaps the elements of to yield .

Figure 4: Two sided soft primitive predicates

Unsatisfiability

Predicate exchange is unable to determine if a predicate is unsatisfiable (e.g. ), and defers to the user to ensure this is the case.

4.2 Approximate Markov Chain Monte Carlo

A soft predicate can serve as an approximate likelihood, and as a result is amenable to likelihood based inference methods such as Markov Chain Monte Carlo. MCMC algorithms require a function

that is proportional to the the target density. In Bayesian inference this is the posterior, dictated by Bayes’ theorem as the product of the likelihood and the prior. Approximate inference using soft predicates takes a similar form.

Let be a model, be a predicate that conditions , and be a relaxation of . Assuming a prior density , the approximate posterior is the product:

(5)

down weights parameter values which violate by the degree to which they violate it. This is modulated by the temperature used in the relaxation kernels which constitute . At maximum temperature has no effect, and the approximate posterior is equal to the prior . At zero temperature, recovers the true posterior since parameter values which violate the condition are given zero weight.

For illustration, let be a model where conditioned on . The approximate posterior is shown at different temperatures in Figure 5 and defined as:

(6)
Figure 5: Approximate Posterior at varying temperatures. Temperature decreases from top row to bottom. Along each row: (left) is the prior term , (center) is the soft likelihood term , and (right) is the approximate posterior

The temperature parameter trades off between tractability of inference and the fidelity of the approximation. Too high and will diverge too greatly from . Too low and convergence will be slow.

4.3 Replica Exchange

Replica exchange simulates (Swendsen & Wang, 1986) replicas at different temperatures, and uses a Metropolis-Hastings update to periodically swap the temperatures of chains. If is an approximate posterior function at temperature , two independent parallel chains simulating targets , they follow a joint target . Replica exchange swaps states between the chains while preserving the joint target. Swapping states is equivalent to swapping predicates, which motivates the name predicate exchange. Concretely, replica exchange proposes a swap from to , and accepts it with probability , where:

(7)

We modify standard replica exchange in two ways: (i) for exact inference, states which violate the constraint are rejected, and (ii) unlike conventional replica exchange which draws samples only from the zero-temperature chain, we accept states from any chain so long as .

Replica exchange has a number of hyper-parameters: the number of parallel chains, the corresponding temperatures, the swapping schedule. Several good practices are outlined in (Earl & Deem, 2005). In practice, we logarithmically space between a lower and upper bound (e.g., , ), and swap states of chains that are adjacent in temperature ( with , with , etc) periodically.

5 Implementation

In this section we describe a generic, lightweight implementation of predicate exchange. Our approach closely mirrors (Wingate et al., 2011; Milch et al., 2007) in the sense that it provides a language independent layer that can be implemented on top of existing programming languages and modeling formalisms. Our objective is to twofold: (i) to compute the prior term , approximate likelihood term , and approximate posterior term (Equation 6) from an arbitrary program , and (ii) to perform Replica Exchange MCMC to sample from this posterior.

A program can be an arbitrary composition of deterministic and stochastic procedures, but all stochastic elements must come from a set of known elementary random primitives

, or ERPs. ERPs correspond to primitive parametric distribution families, such as the uniform or normal distribution. Let

be a set of ERP types. Each type must support (i) evaluation of the conditional density , and (ii) sampling from the distribution. Concretely, a conditioned program is a any nullary program that contains the statements:

  1. returns a random sample from . is a unique named described below.

  2. conditions . It throws an error if is 0, and otherwise allows simulation to resume with no effect.

Example Program 1 illustrates a simple conditioned model.

5.1 Tracked Soft Execution

The prior term is computed automatically as the product of random choices in the program. That is, let be the k’th ERP encountered in while executing , be the value it takes, and denote the set of all values of all ERPs constructed in the simulation of , is the product:

(8)

Crucially, the parameters for each random variable may be fixed values or depend on values of other random variables in .

  
  
  
  Return:
Example Program 1

Predicate exchange relies on softexecute (Algorithm 3), which formalizes the soft execution of a program at temperature , in the context of dictionary . is a mutable mapping from a set of names to values. In the context of a particular dictionary, the simulation of a program is deterministic. This allows the simulation of to be modulated by controlling the elements of .

softexecute simulates but within a context where (i) variables and accumulate prior and approximate posterior values, and (ii) the following operators are redefined:

  1. returns , and in compliance with Equation 8 updates with the conditional density. If is not a key in , the distribution is sampled from and is updated with this value.

  2. and for are replaced with the softened counter-parts .

  3. updates with . due to soft primitive operators.

softexecute returns a real value for the approximate posterior of as a function of the dictionary .

Control Flow

Programs may have control flow constructs, such as if-then-else statements. These may cause softexecute to return a value that is significantly less than . This is because if a branch condition is a function of an uncertain value, then several unexplored alternative paths could produce values that are closer to the constraint set. softexecute is ignorant of thees other possibilities For illustration, consider Example Program 2 2. If the condition fails, and the predicate relaxation will yield , which is significantly larger than if the true branch were taken.

  
  if  then
     
  else
     
  end if
  Return:
Example Program 2

Problems of this form appear in all forms of program analysis. This problem is called the path explosion problem, since the number of possible paths often increases combinatorially with program size and runtime length. Automated program testing, which is concerned with finding program paths that yield to failure has developed various strategies (Cadar et al., 2008; Sen et al., 2005). Unlike automated testing, probabilistic inference has the stricter requirement of adhering to the true posterior distribution. However, in predicate exchange, we have a latitude on all nonunitary values. This opens up the potential for extending program analysis methods to the probabilistic domain in future work.

  Input: program , temperature , dictionary
  Initialize
  Simulate with following subroutines redefined as:
  subroutine 
     if  then
        
     else
         sample from
        Update dictionary:
     end if
     
     Return from subroutine:
  end subroutine
  
  subroutine 
     
  end subroutine
  
  subroutine  for
     Return from subroutine:
  end subroutine
  
  Return:
Algorithm 3 Soft Execution:

5.2 Replica Exchange

Predicate exchange (Algorithm 4) performs replica exchange using softexectute as an approximate posterior. It takes as input an mcmc algorithm, which simulates an Markov Chain by manipulating elements of the . In our experiments, for finite dimensional continuous models we use the No U-Turn Sampler (Hoffman & Gelman, 2014), a variant of Hamiltonion Monte Carlo. We use reverse-mode automatic differentiation (Griewank & Walther, 2008) to compute the negative log gradient of . For other models we use standard Metropolis Hastings by defining proposals on elements in the dictionary. In particular we use the single site MH (Wingate et al., 2011) which modifies a single random variable at a time.

  Input: program , temperatures , nsamples
  Input: mcmc, nsamples between swaps
  Initialize empty collection of dictionarys
  Initialize empty dictionarys
  Define
  repeat
     for  to  do
         mcmc samples at temp , from
        
        for  to  do
           if  then
              append to
           end if
        end for
     end for
     for  down to  do
        
        
        if  random sample in  then
           swap with
        end if
     end for
  until  has elements
  Return:
Algorithm 4 Predicate Exchange

6 Experiments

Small Models

In Figure 6

we demonstrate two examples of conditioning on predicates which are non trivial. First we show that the conditioning can be used to truncate a Gaussian distribution, and the approximation behavior at varying temperatures. Second we show that two independent random variables can be made equal. While simple, both are a challenge for probabilistic programming systems because they prevent automatic calculation of the likelihood.

Figure 6: Left: Density from samples of Gaussian truncated to through conditioning. Right: Conditioning on where and are independent normal distributions; shown at different temperatures.

Glucose Model

Type 2 diabetes is a prevalent and costly condition. Keeping blood glucose within normal limits helps prevent the long-term complications of Type 2 diabetes like diabetic neuropathy and diabetic retinopathy (Brownlee & Hirsch, 2006). Models to predict the trajectories of blood glucose aid in keeping glucose within normal limits (Zeevi et al., 2015). Traditional models have been built from compositions of differential equations (Albers et al., 2017; Levine et al., 2017)

whose parameters are estimated separately for each patient. An alternative approach would be to use a flexible sequence model like an RNN. The problem with this approach is that an RNN can extrapolate to glucose values incompatible with human physiology. This is especially a problem where we have patients with only a few blood glucose measurements. To build an RNN model that respects physiology, we condition on it.

We compare the independent RNN model to the one with declarative knowledge on a second patient from Physionet (Moody et al., 2001). Figure 7 plots the results performed on more than 300 pairs of patients. We see that the conditional model simulates more realistic glucose dynamics for the patient with only a short observed time-series.

Figure 7: Left: Actual (dotted) and predicted trajectories that were learned using a partial trajectory. Center: Distribution of predicted trajectories learned only using the first ten data points and a tie with a secondary patient. Right, top: MSE when tie is present. Right, bottom: without tie. Tying expectations has dramatic influence on prediction error, while as more data is observed, the effect of tying decreases.

7 Discussion

In this work we expanded the class of predicates that probabilistic models can be conditioned on in practice.

Problems of this form appear in all forms of program analysis. This problem is called the path explosion problem, since the number of possible paths often increases combinatorially with program size and runtime length. Automated program testing, which is concerned with finding program paths that yield to failure has developed various strategies (Cadar et al., 2008; Sen et al., 2005). Unlike automated testing, probabilistic inference has the stricter requirement of adhering to the true posterior distribution. However, in predicate exchange, we have a latitude on all nonunitary values. This opens up the potential for extending program analysis methods to the probabilistic domain in future work.

References

  • Albers et al. (2017) Albers, D. J., Levine, M., Gluckman, B., Ginsberg, H., Hripcsak, G., and Mamykina, L. Personalized glucose forecasting for type 2 diabetes using data assimilation. PLoS computational biology, 13(4):e1005232, 2017.
  • Albert et al. (2015) Albert, C., Künsch, H. R., and Scheidegger, A. A simulated annealing approach to approximate bayes computations. Statistics and computing, 25(6):1217–1232, 2015.
  • Andrieu et al. (2003) Andrieu, C., De Freitas, N., Doucet, A., and Jordan, M. I.

    An introduction to mcmc for machine learning.

    Machine learning, 50(1-2):5–43, 2003.
  • Beaumont et al. (2002) Beaumont, M. A., Zhang, W., and Balding, D. J. Approximate bayesian computation in population genetics. Genetics, 162(4):2025–2035, 2002.
  • Brownlee & Hirsch (2006) Brownlee, M. and Hirsch, I. B. Glycemic variability: a hemoglobin a1c–independent risk factor for diabetic complications. Jama, 295(14):1707–1708, 2006.
  • Cadar et al. (2008) Cadar, C., Ganesh, V., Pawlowski, P. M., Dill, D. L., and Engler, D. R. Exe: automatically generating inputs of death. ACM Transactions on Information and System Security (TISSEC), 12(2):10, 2008.
  • Carpenter et al. (2017) Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., and Riddell, A. Stan: A probabilistic programming language. Journal of statistical software, 76(1), 2017.
  • Chang & Pollard (1997) Chang, J. T. and Pollard, D. Conditioning as disintegration. Statistica Neerlandica, 51(3):287–317, 1997.
  • Çınlar (2011) Çınlar, E. Probability and stochastics, volume 261. Springer Science & Business Media, 2011.
  • De Raedt et al. (2007) De Raedt, L., Kimmig, A., and Toivonen, H. Problog: A probabilistic prolog and its application in link discovery.

    International Joint Conferences on Artificial Intelligence

    , 2007.
  • Del Moral et al. (2012) Del Moral, P., Doucet, A., and Jasra, A. An adaptive sequential monte carlo method for approximate bayesian computation. Statistics and Computing, 22(5):1009–1020, 2012.
  • Earl & Deem (2005) Earl, D. J. and Deem, M. W. Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23):3910–3916, 2005.
  • Goodman et al. (2008) Goodman, N. D., Mansinghka, V. K., Roy, D., Bonawitz, K., and Tenenbaum, J. B. Church: a language for generative models. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, pp. 220–229. AUAI Press, 2008.
  • Graham et al. (2017) Graham, M. M., Storkey, A. J., et al. Asymptotically exact inference in differentiable generative models. Electronic Journal of Statistics, 11(2):5105–5164, 2017.
  • Griewank & Walther (2008) Griewank, A. and Walther, A. Evaluating derivatives: principles and techniques of algorithmic differentiation, volume 105. Siam, 2008.
  • Hoffman & Gelman (2014) Hoffman, M. D. and Gelman, A. The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014.
  • Jordan et al. (1999) Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233, 1999.
  • Kimmig et al. (2012) Kimmig, A., Bach, S., Broecheler, M., Huang, B., and Getoor, L. A short introduction to probabilistic soft logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pp. 1–4, 2012.
  • Klir & Yuan (1995) Klir, G. and Yuan, B. Fuzzy sets and fuzzy logic, volume 4. Prentice hall New Jersey, 1995.
  • Kulkarni et al. (2015) Kulkarni, T. D., Whitney, W. F., Kohli, P., and Tenenbaum, J. Deep convolutional inverse graphics network. In Advances in neural information processing systems, pp. 2539–2547, 2015.
  • Lenormand et al. (2013) Lenormand, M., Jabot, F., and Deffuant, G. Adaptive approximate bayesian computation for complex models. Computational Statistics, 28(6):2777–2796, 2013.
  • Levin (2000) Levin, V. Basic concepts of continuous logics. Kybernetes, 29(9/10):1234–1249, 2000.
  • Levine et al. (2017) Levine, M. E., Hripcsak, G., Mamykina, L., Stuart, A., and Albers, D. J. Offline and online data assimilation for real-time blood glucose forecasting in type 2 diabetes. arXiv preprint arXiv:1709.00163, 2017.
  • Mansinghka et al. (2014) Mansinghka, V., Selsam, D., and Perov, Y. Venture: a higher-order probabilistic programming platform with programmable inference. arXiv preprint arXiv:1404.0099, 2014.
  • Marjoram et al. (2003) Marjoram, P., Molitor, J., Plagnol, V., and Tavaré, S. Markov chain monte carlo without likelihoods. Proceedings of the National Academy of Sciences, 100(26):15324–15328, 2003.
  • Marschner & Greenberg (1998) Marschner, S. R. and Greenberg, D. P. Inverse rendering for computer graphics. Citeseer, 1998.
  • Milch et al. (2007) Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. 1 blog: Probabilistic models with unknown objects. Statistical relational learning, pp. 373, 2007.
  • Moody et al. (2001) Moody, G. B., Mark, R. G., and Goldberger, A. L. Physionet: a web-based resource for the study of physiologic signals. IEEE Engineering in Medicine and Biology Magazine, 20(3):70–75, 2001.
  • Murata et al. (2004) Murata, G. H., Hoffman, R. M., Shah, J. H., Wendel, C. S., and Duckworth, W. C. A probabilistic model for predicting hypoglycemia in type 2 diabetes mellitus: The diabetes outcomes in veterans study (doves). Archives of internal medicine, 164(13):1445–1450, 2004.
  • Ranganath et al. (2014) Ranganath, R., Gerrish, S., and Blei, D. Black box variational inference. In Artificial Intelligence and Statistics, pp. 814–822, 2014.
  • Richardson & Domingos (2006) Richardson, M. and Domingos, P. Markov logic networks. Machine learning, 62(1-2):107–136, 2006.
  • Sen et al. (2005) Sen, K., Marinov, D., and Agha, G. Cute: a concolic unit testing engine for c. In ACM SIGSOFT Software Engineering Notes, volume 30, pp. 263–272. ACM, 2005.
  • Sisson et al. (2007) Sisson, S. A., Fan, Y., and Tanaka, M. M. Sequential monte carlo without likelihoods. Proceedings of the National Academy of Sciences, 104(6):1760–1765, 2007.
  • Swendsen & Wang (1986) Swendsen, R. H. and Wang, J.-S. Replica monte carlo simulation of spin-glasses. Physical review letters, 57(21):2607, 1986.
  • Tavaré et al. (1997) Tavaré, S., Balding, D. J., Griffiths, R. C., and Donnelly, P. Inferring coalescence times from dna sequence data. Genetics, 145(2):505–518, 1997.
  • Toni et al. (2008) Toni, T., Welch, D., Strelkowa, N., Ipsen, A., and Stumpf, M. P. Approximate bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface, 6(31):187–202, 2008.
  • Wegmann et al. (2009) Wegmann, D., Leuenberger, C., and Excoffier, L. Efficient approximate bayesian computation coupled with markov chain monte carlo without likelihood. Genetics, 2009.
  • Weiss & von Haeseler (1998) Weiss, G. and von Haeseler, A. Inference of population history using a likelihood approach. Genetics, 149(3):1539–1546, 1998.
  • Wingate et al. (2011) Wingate, D., Stuhlmüller, A., and Goodman, N. Lightweight implementations of probabilistic programming languages via transformational compilation. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 770–778, 2011.
  • Wood et al. (2014) Wood, F., Meent, J. W., and Mansinghka, V. A new approach to probabilistic programming inference. In Artificial Intelligence and Statistics, pp. 1024–1032, 2014.
  • Zeevi et al. (2015) Zeevi, D., Korem, T., Zmora, N., Israeli, D., Rothschild, D., Weinberger, A., Ben-Yacov, O., Lador, D., Avnit-Sagi, T., Lotan-Pompan, M., et al. Personalized nutrition by prediction of glycemic responses. Cell, 163(5):1079–1094, 2015.