Log In Sign Up

Probabilistic programs for inferring the goals of autonomous agents

by   Marco F. Cusumano-Towner, et al.

Intelligent systems sometimes need to infer the probable goals of people, cars, and robots, based on partial observations of their motion. This paper introduces a class of probabilistic programs for formulating and solving these problems. The formulation uses randomized path planning algorithms as the basis for probabilistic models of the process by which autonomous agents plan to achieve their goals. Because these path planning algorithms do not have tractable likelihood functions, new inference algorithms are needed. This paper proposes two Monte Carlo techniques for these "likelihood-free" models, one of which can use likelihood estimates from neural networks to accelerate inference. The paper demonstrates efficacy on three simple examples, each using under 50 lines of probabilistic code.


page 2

page 3

page 8

page 9

page 12

page 13

page 14


Path Planning Games

Path planning is a fundamental and extensively explored problem in robot...

A Multi-stage Probabilistic Algorithm for Dynamic Path-Planning

Probabilistic sampling methods have become very popular to solve single-...

Online Bayesian Goal Inference for Boundedly-Rational Planning Agents

People routinely infer the goals of others by observing their actions ov...

Path Planning in Dynamic Environments using Generative RNNs and Monte Carlo Tree Search

State of the art methods for robotic path planning in dynamic environmen...

A Unified View of Algorithms for Path Planning Using Probabilistic Inference on Factor Graphs

Even if path planning can be solved using standard techniques from dynam...

NOPA: Neurally-guided Online Probabilistic Assistance for Building Socially Intelligent Home Assistants

In this work, we study how to build socially intelligent robots to assis...

1 Introduction

Intelligent systems sometimes need to infer the probable goals of people, cars, and robots, based on partial observations of their motion. These problems are central to autonomous driving and driver assistance [Franke et al., 1998; Urmson et al., 2008; Aufrère et al., 2003], but also arise in aerial robotics, reconnaissance, and security applications [Kumar and Michael, 2012; Liao et al., 2006; Tran and Davis, 2008]. In these settings, knowledge of the beliefs and goals of an agent makes it possible to infer their probable future actions.

Because the mental state of another agent is inherently unobservable and uncertain, it is natural to take a Bayesian approach to inferring it. Probabilistic models can be used to describe how an agent’s latent high-level goals and beliefs about the environment interact to yield its probable actions. Most existing work along these lines has focused on modeling goal-directed behavior using Markov decision processes and related approaches from stochastic control

[Baker et al., 2007; Ziebart et al., 2009]. While promising, these approaches involve significant task-specific engineering. They also calculate policies that prescribe actions for every possible state of the world, sometimes in the inner loop of an inference algorithm. This leads to fundamental scaling challenges, even for simple environments and goal priors.

This paper introduces a class of probabilistic programs that formulate goal inference problems as approximate inference in generative models of goal-directed behavior. The proposed approach reflects three contributions: First, agents are assumed to follow paths generated by fast randomized path planning code that can incorporate heuristics drawn from video game engines and robotics. This can scale to larger environments than approaches based on optimal control. Second, hierarchical models for goals and paths are represented as probabilistic programs. This allows one to formulate a broad class of single- and multi-agent problems with common modeling and inference machinery. Ordinary probabilistic programming constructs can handle complex maps, hierarchical goal priors, and partially observed environments. Third, this paper proposes an approach to real-time approximate inference, using neural networks to learn proposals for the internal choices made by any path planners. Together, these contributions lead to a practical proposal for goal inference that has the potential to scale to a broad class of real-world problems and real-time applications. We demonstrate the efficacy of prototype implementations of these algorithms on three simple examples, each written in under 50 lines of probabilistic code.

Figure 1: Each image shows paths from 60 independent runs of agent-path for different settings of parameters (number of local refinement iterations) and (number of global restarts, among which the shortest path is selected).

Note that this proposal does not require planning algorithms to be rewritten as probabilistic programs, but instead allows optimized, low-level, or legacy planning codes to be treated as black boxes. This avoids the implementation and performance cost of rewriting an existing path planner in a high-level probabilistic programming language, and exposing the thousands of random choices it might make to generic inference algorithms. One difficulty is that such optimized black-box planners may well make too many internal random choices to have tractable input-output likelihoods. This paper proposes two novel Monte Carlo techniques for these “likelihood-free” models, each extending Metropolis-Hastings: (i) a cascading resimulation algorithm that makes joint proposals to ensure cancellation of the unknown likelihoods, and (ii) a nested inference algorithm that uses estimated likelihoods derived from inference over the internal random choices of the planner. Cascading resimulation is simple to implement, but nested inference enables use of a broad class of Monte Carlo, variational, and neural network mechanisms to handle the intractable likelihoods.

2 Modeling Goal-Directed Behavior Using Randomized Path Planners

This paper defines probabilistic models of goal-directed behavior using randomized path-planning algorithms. Algorithm 1 describes one such planner, called agent-path. This planner can be applied to a broad class of environments with complex obstacles. The planner assumes a bounded two-dimensional space (e.g., the square ) and a world map that is a set of polygonal obstacles. The planner takes as input a start location , a goal location , the map , and a sequence of time points , and returns either a sequence of locations on a path from to at each time , or ‘no-path-found’. The planner operates by growing a rapidly-exploring random tree (RRT) [LaValle, 1998] from the start location to fill the space, searching for a clear line of sight between the tree and the goal. If a path is found, it is then refined to minimize its length using local optimization. Finally, the agent walks the path at a constant speed, producing the output locations . See Appendix A for more details.

Many variations of this planner are possible, including versions that take into account costs other than path length, and spaces encoding configurations other than geographic position (e.g., configuration spaces of an articulated robot). The planner parameters and trade off the cost of planning with the (probable) optimality of the paths (see Figure 1). Figure 2 and Figure 4 show this planner being used as a modeling primitive in the Venture probabilistic programming platform [Mansinghka et al., 2014]. The planner was implemented in C and imported as a foreign modeling primitive into Venture. Venture supports likelihood-free primitives and design of custom inference strategies, including those of Section 3.

Figure 2: Inferring a simulated drone’s goal from observed motion. (a) shows a Venture model of a single drone that begins at location start and moves to goal. agent_path is a likelihood-free path planner that models the drone’s goal-directed behavior using Algorithm 1

. (a) also shows a corresponding Bayesian network, with observed nodes shaded. (b) and (c) show results of goal inference in this model for two different environments, given the same observed path, using Cascading Resimulation Metropolis-Hastings. In Scenario 1, the drone’s goal is likely outside the enclosure, since it did not go directly into the enclosure through the bottom. In Scenario 2, with bottom access to the enclosure blocked, a goal inside the enclosure is somewhat probable, as the drone’s path no longer seems indirect.

2:procedure rrt(, , )
3:       Initialize tree with start
4:      for  to  do tree growth iterations
5:             Random point
6:            if  then
9:                  Propose new vertex
10:                 if  then
11:                        Extend tree
12:                       if  then
15:       ‘no-path-found’
16:procedure plan-path(, , ; , )
17:      for  to  do Generate paths
22:       Select best of paths
24:procedure agent-path(, , , ; , )
25:       Abstract path
26:       Locations at times
Algorithm 1 Model of an agent’s path given destination

3 Inference in Probabilistic Programs With Likelihood-Free Primitives

The path planner agent-path of Algorithm 1 can be used in a probabilistic program either by implementing the planner in a probabilistic programming language, or by treating the planner as a primitive random choice. We treat the planner as a random choice, as this allows use of an optimized C implementation of the planner. However, probabilistic programming languages such as Church, Stan, BLOG, and Figaro all require random choices to have tractable marginal likelihoods [Goodman et al., 2012; Carpenter et al., 2016; Milch et al., 2007; Pfeffer, 2009]. Computing the marginal likelihood of agent-path for outputs and inputs , , , and would involve an intractable intregral over the (thousands of) internal random choices made in agent-path.

This section introduces two Monte Carlo strategies for inference in probabilistic programs that include random choices with intractable marginal likelihoods, referred to as “likelihood-free” primitives. The first strategy, shown in Algorithm 2, is called Cascading Resimulation Metropolis-Hastings; it makes block proposals to likelihood-free random choices, exploiting cancellation of the unknown likelihoods. The second, shown in Algorithm 3, is called Nested Inference Metropolis-Hastings; it uses Monte Carlo estimates of the unknown likelihoods in place of the likelihoods themselves. Although simple techniques like likelihood-weighting can also be used in the presence of likelihood-free primitives, they tend to work well only when a global proposal that is well-matched to the posterior is available. The algorithms we introduce do not have this limitation.

We first introduce notation. Let be the set of primitive random choices available to a probabilistic program (e.g. ). For , let denote the set of valid arguments for the primitive, let denote the set of possible outputs, and let denote the marginal likelihood of output given arguments , where for all . We do not require evaluation of to be computationally tractable.

Following Wingate et al. [2011], for a probabilistic program , we assume there is a name assigned to every possible random choice, for some countable . We assume that distinct random choices are assigned unique names within every execution of . The set of names used in an execution is some finite set . We require that all random choices with name are of the same type . Each unique completed execution of can therefore be represented as the finite set of names of those random choices made in the execution, together with the result values. We denote these results , and denote this complete package . The tuple is called an execution trace of the probabilistic program .

This paper focuses on probabilistic programs where is the same for all executions — that is, the set of random choices made is not affected by any of those choices. Relaxations of this are left for future work; more general formalizations of probabilistic programs can be found in [Wingate et al., 2011; Mansinghka et al., 2014].

We consider random choice to depend on random choice if changing the result of can lead to a change in the inputs of , even if all other results are held fixed. We assume that it is possible to construct a directed acyclic dependency graph among random choices , where an edge exists if and only if random choice depends on random choice in the above sense. The parents of a random choice are denoted by . The arguments of each random choice are then a (deterministic) function of the results of random choices in , which are denoted ; we write . Let denote the ‘children’ of choice . Also, let denote the random choices with intractable likelihoods (the “likelihood-free” choices). Let denote the random choices that are constrained based on data, which must have tractable likelihoods. Let denote the values we are constraining those random choices to. The joint probability density of an execution trace is:


where we have omitted the dependence on because it is the same for all executions.

3.1 Cascading Resimulation Metropolis-Hastings

How can a probabilistic program cope with complex, likelihood-free primitives? Our core insight is that if the proposal distribution for a random choice is equal to the prior , then the likelihoods will cancel in a Metropolis-Hastings (MH) acceptance ratio and therefore do not need to be explicitly computed. Sampling from the prior is achieved simply by simulating the random choice. A (prototypical) acceptance ratio looks like this:

We use blocked proposals in which a change to a likelihood-free choice is proposed from the prior whenever a proposal is made to any of its parents. A likelihood-free choice that is proposed may itself have likelihood-free choices as children, in which case these children are also proposed, generating a cascade of proposals. Algorithm 2 shows the Cascading Resimulation MH transition operator, which extends an initial custom proposal to random choice (which must not be likelihood-free) to also include any likelihood-free random choices in the cascade, such that the intractable likelihoods cancel.

2: Propose a new value for choice
3: Initially, no change to other choices
4: Unnormalized target density for previous values
5: Unnormalized target density for proposed values
6: Ask for likelihoods from
7: Likelihood-free cascade participants
8: Visited choices with tractable likelihoods
9:while  do
10:       Pop in topological order
11:      if  then Choice is likelihood-free
12:             Propose from prior
13:             Ask for child likelihoods
15:      else Choice has tractable likelihood
19: MH ratio
21:if  then
22:       Accept
Algorithm 2 Single-site Cascading Resimulation Metropolis-Hastings transition

Algorithm 2 is a Metropolis-Hastings transition over the random choices with target density equal to the local posterior , and with proposal density:


The Metropolis-Hastings acceptance ratio is:


We illustrate Cascading Resimulation MH in Figure 2, on the task of inferring the goal of a simulated drone in an observed environment.

3.2 Nested Inference Metropolis-Hastings

In some problems, Cascading Resimulation MH will generate many expensive simulations of likelihood-free choices, most of which will be rejected. For these problems, and for real-time applications, we propose an alternative Metropolis-Hastings algorithm, called Nested Inference MH, that uses Monte Carlo estimates of the intractable likelihoods in the acceptance ratio. The likelihood estimates are obtained using auxiliary “nested inference” algorithms, which sample probable values for the internal random choices made by a likelihood-free choice (e.g. a randomized planning algorithm) given its inputs and outputs, and calculate a weight that can be used to form an importance sampling estimate of the unknown likelihood.

Nested Inference MH is based on an interpretation of likelihood-free random choices like agent-path as probabilistic programs in their own right. Let be an execution trace of a likelihood-free random choice of type . We denote the joint density on execution traces and return values of the random choice, given input arguments , by . The marginal likelihood of the random choice is given by the (intractable) integral . We denote the conditional trace density for arguments and output by .

Nested inference assumes the existence of a nested inference algorithm that samples execution traces according to some density that approximates the conditional density on traces of the likelihood-free choice, i.e., . We require that for all where . Using the nested inference algorithm as an importance sampler, we produce an unbiased importance sampling estimate of the random choice’s intractable likelihood for arguments and output by sampling times from the inference algorithm, as follows:


Nested inference also assumes that the ratio can be evaluated. While in principle the nested inference algorithm can be produced by recoding the likelihood-free primitive in a high-level probabilistic programming language, this is by no means required, nor do we expect it to be the common case. In this paper, we focus on nested inference algorithms that use learned neural networks.

The accuracy of the likelihood estimate is determined by the accuracy of the nested inference algorithm. Specifically, for

the variance of the estimate is:


where denotes the chi-square divergence [Nielsen and Nock, 2014], and where and on the right-hand side represent density functions over , not specific density values. Similarly, we can view for as a (biased) estimator of , where the bias is:


where denotes the Kullback-Leibler (KL) divergence [Kullback and Leibler, 1951].

3.2.1 Nested Inference Metropolis-Hastings

Algorithm 3 describes a Nested Inference MH transition in which a custom proposal is made to a likelihood-free random choice that uses estimated likelihoods produced using a nested inference algorithm. It assumes that all children of also have nested inference algorithms themselves. Heterogeneous configurations are also possible.

2: Propose a new value for choice
3: No change to other choices
4:for  to  do
5:       Choice nested inference
6: Estimate
7:for  do
8:      for  to  do
9:             Choice nested inference       
10:       Estimate
13:if  then
14:       Accept
15:      for  do
16:             Update density estimates       
Algorithm 3 Single-site Nested Inference Metropolis-Hastings transition

Although this transition uses Monte Carlo estimates of likelihoods in the acceptance ratio, it is a standard Metropolis-Hastings transition on an extended state space that includes the result of the proposed-to random choice , traces of the proposed-to random choice, and traces of each child of the proposed-to random choice. The target density on the extended space is:


The proposal density on the extended space is:


The values of other random choices are constant. See Appendix C for derivation. The marginal density of in the extended target density is the local posterior for the result of random choice given the values of all other random choices. Single-site Nested Inference MH transitions that propose to different random choices but use the same database of nested-inference likelihood estimates

can be composed to form Markov chains that converge to the posterior


Our use of unbiased likelihood estimates in place of the true likelihoods when computing the Metropolis-Hastings acceptance ratio in Algorithm 3 is closely related to pseudo-marginal MCMC [Andrieu and Roberts, 2009] and particle MCMC [Andrieu et al., 2010]. Indeed, each single-site Nested Inference MH transition can be seen as a compositional variant of a ‘grouped independence MH’ transition [Beaumont, 2003] in which several pseudo-marginal likelihoods (one for each random choice ) are used in the same update. The database of nested-inference likelihood estimates stores the ‘recycled’ pseudo-marginal likelihood estimates from previous transitions.

The convergence rate of a Markov chain based on Nested Inference MH transition operators depends on the accuracy of the nested inference algorithm and . In the limit of exact nested inference algorithm () the likelihood estimates are exact, and the algorithm is identical to standard Metropolis-Hastings. If the nested inference algorithm is very inaccurate, it may routinely propose traces that are incompatible with the output of the random choice, resulting in low acceptance rates. Better characterizing how the convergence rate depends on the accuracy of the nested inference algorithms and on is an important area for future work.

3.2.2 Learning a nested inference algorithm

It is possible to learn a nested inference algorithm that approximates . The idea of learned inference for probabilistic generative models goes back at least to Morris [2001] and has also been used in Stuhlmüller et al. [2013] and Kingma and Welling [2013]. We apply this idea to nested inference as follows. Let denote a nested inference algorithm that is parameterized by — for example, might be the weights of a neural network used as part of the inference algorithm. We establish a training distribution over the arguments to the primitive , and approximately solve the following optimization problem:

The goal is for to approximate well (i.e., have small KL divergence) for typical input arguments . We approximate this objective function by drawing independent sets of input arguments from the training distribution, and running a traced execution of the likelihood-free random choice (e.g. planner) on each set of arguments, recording111This training regime cannot be applied to a true black-box path planner, since a recording of its internal randomness is now necessary. However, such recordings can be produced from a straightforwardly instrumented version of the algorithm. The likelihood estimator for the planner can still be treated as a black-box by the Nested Inference MH transition. the trace and output :

We use the resulting dataset to define an approximate objective function

that is an unbiased estimate of the original objective function:


where does not depend on . Note that minimizing over is equivalent to maximizing the log-likelihood of the data . Because we use forward simulations to produce jointly from , we have one exact conditional sample for each training example.

4 Example Applications

We have implemented four example applications, designed to illustrate the flexibility of our framework:

  1. Inferring the probable goal of a simulated drone. This example shows that small changes to the environment, such as including an additional doorway, can yield large changes in the inferred goals.

  2. Inferring the probable goal of a simulated drone with a more complex planner. Specifically, we model the drone as following a multi-part path produced by a planner that first chooses a waypoint uniformly at random and then recursively solves the two path planning problems induced by the choice of waypoint. This example shows (a) applicability of the framework to more complex models of goal-directed behavior, and (b) that Nested Inference MH with a learned neural network can outperform Cascading Resimulation MH.

  3. Inferring whether or not two people walking around tables in a room are headed for the same goal or different goals. This example demonstrates applicability to simple hierarchical models for goals and also demonstrates applicability to real-world (as opposed to synthetic) data.

  4. Jointly inferring a simulated agent’s goals and its beliefs about an obstacle in the map whose location, size, and orientation is unknown to the probabilistic program. This example is described in the appendix due to space constraints.

Figure 3: Comparison of three Metropolis-Hastings (MH) strategies for goal inference in a model that uses the agent-waypoint-path planner, which models an agent’s motion using an unknown waypoint. (a), (b) and (c) show 960 independent approximate posterior goal samples (red) obtained using each strategy for similar run-times, given known map, start location (orange), and observations (white). Cascading Resimulation MH (CR) and Resimulation Nested Inference MH (RNI) do not give accurate inferences in real-time because they propose the waypoint from the prior. Neural Nested Inference MH (NNI) uses a neural network to propose the waypoint and gives accurate results in real-time (median 115 ms per sample). (d) shows estimated KL divergences from gold-standard samples to each of the strategies as the number of MH transitions are varied. Circles in (d) show the amount of computation used for (a,b,c).

4.1 Example 1: Sensitivity of Goal Inference to Small Map Changes

Figure 2 shows a comparison of goal inference in two different maps given the same observations. The map for the scenario on the left has an enclosure with two openings, one on the top and one on the bottom, while the map for the scenario on the right has a single opening. In the map on the left, the inferred goal samples fall outside the enclosure, because if the drone intended to go inside the enclosure, it could have taken a much shorter path. In the map on the right, a significant fraction of goal samples fall inside the enclosure, as relatively efficient paths into the enclosure go through the partial trajectory that has been observed so far. Samples shown are the final states of 480 independent replicates of a Markov chain initialized from the prior, with Cascading Resimulation MH transitions (Algorithm 2) using the prior as the proposal. Planner parameters are , , , , , .

4.2 Example 2: Handling Path Planners With Waypoints via Nested Inference

Next, we used a model where the agent may choose a waypoint and separately plan a path to the waypoint and a path from the waypoint to the goal (agent-waypoint-path, Algorithm 4). Unlike the simpler agent-path model, which typically samples from a small number of modes concentrated at efficient routes from the start to the goal, agent-waypoint-path yields paths that are unpredictable without knowledge of the waypoint. Parameters and of plan-path are omitted for simplicity. We consider the same goal inference task as in Example 1 but with the alternative planner. Cascading Resimulation MH performs poorly on this task, because the prior is a poor proposal for the internal random choices of agent-waypoint-path.

2:procedure agent-waypoint-path(, , , )
3:       Pick waypoint
4:       Use waypoint?
5:      if  then
6:             Start-waypoint
7:             Waypoint-goal
8:             Concatenate paths
9:      else
10:             Start to goal       
11:       Locations at times
12:       Add noise to locations
13:       Return noisy agent locations
Algorithm 4 Pseudo-code for a likelihood-free primitive that models observed motion of an agent with known goal but optional unknown waypoint

Algorithm 5 shows a nested inference algorithm for agent-waypoint-path that uses a neural network to propose the waypoint () and whether the waypoint is used (), given the goal and observations, and then executes the rest of the planner, conditioned on and . The network was trained on 10,000 runs of agent-waypoint-path with random goal input and fixed world map and start . The nested inference algorithm splits the trace of agent-waypoint-path into and (the random choices made within executions of plan-path), so that . The density of the nested inference algorithm is then , and the density ratio , which is used by Nested Inference MH when estimating the planner likelihoods, simplifies to . To evaluate this ratio, we separately evaluate the density of add-noise and the density of the neural network’s stochastic outputs.

2: Sample waypoint from neural net.
3:if  then
4:       Start to waypoint
5:       Waypoint to goal
8:       Start to goal
Algorithm 5 Using a neural network for nested inference in the agent-waypoint-path path planner

We compared three strategies for goal inference: Nested Inference MH using Algorithm 5 and , Cascading Resimulation MH, and Nested Inference MH using a “resimulation” nested inference algorithm and . Figure 3 shows that neural Nested Inference MH converges faster than the other strategies. Planner parameters were the same as in Example 1. All inference strategies were implemented using a custom Python inference library. Integration of Nested Inference MH with Venture is left for future work.

4.3 Example 3: Modeling Real-World Human Motion

The Venture program of Figure 4(c) defines a model with two agents whose destinations may or may not be the same. The environment (world) and the start locations of the agents are known. The is_common_goal flag determines whether the agents share the same goal destination. The paths of both agents are modeled using agent-path. The corresponding Bayesian network is shown in Figure 4(e). We collected video of two collaborators walking in a scene containing tables, for two conditions—one in which the they meet at a common location, and one where they diverge. For the common-goal condition we constructed short and extended sequences of observed locations (Figure 4(a) and (b)). We used Cascading Resimulation MH for inference, initialized from the prior, with a joint prior proposal over all latent variables. We ran 60 chains of transitions each, and rendered the final states in Figures 4(a-b). The speed for each individual was set to their average speed along the observed path. The estimated probabilities of is_common_goal=True for the short and extended sequences are 0.63 and 0.82 respectively. This trend qualitatively matches human judgments, shown in Figure 4(d) (the model was not calibrated to match human judgments). See Appendix B for additional results.

Figure 4: Inferring whether or not two people are headed to the same destination. (c) shows a Venture model of two people. (e) shows a Bayesian network representation of the model with observed variables shaded. (a) and (b) show the map, two pairs of observed trajectories, and approximate posterior samples obtained from Cascading Resimulation MH. Each sample with is_common_goal=True is rendered as a single yellow circle. Each sample with is_common_goal=False is rendered as two separate magenta and blue circles. In (a) inference is uncertain about goal locations, and the estimated probability of is_common_goal=True is 0.63. In (b) inference gives a concentrated common goal region with estimated common-goal probability 0.82. Some probability mass is reserved for the people walking past one another to uncertain destinations. (d) shows judgments from 30 human responders of the likelihood over time that the individuals have different destinations, for the video sequence spanned by the frames in (a,b). The human judgments qualitatively agree with the automated inferences.

5 Discussion

This paper introduced a class of probabilistic programs for formulating goal inference as approximate inference in probabilistic generative models of goal-directed behavior. The technical contributions are (i) a probabilistic programming formulation that makes complex goal and map priors easy to specify; (ii) the use of randomized path planning algorithms as the backbone of generative models; and (iii) the introduction of Monte Carlo techniques that can handle the intractable likelihoods of these path planners. The experiments showed that it is possible for short probabilistic programs to make meaningful inferences about goal-directed behavior.

From the standpoint of robotics, autonomous driving, or reconnaissance, the examples in this paper are quite preliminary. More experiments are needed to explore the accuracy of approximate inference in these models, as well as the accuracy of the models themselves, especially on real-world problems. The probabilistic programming formulation makes it easy to explore variations of models, environments, and inference strategies.

The problem of inferring the mental states of autonomous agents is central to probabilistic artificial intelligence. It may also be a natural application for structured generative models and for probabilistic programming, but only if sufficiently fast and flexible inference schemes can be developed. We hope this paper helps to encourage the use of probabilistic programming for building intelligent software that can draw meaningful inferences about goal-directed behavior.


The authors would like to thank Feras Saad for obtaining human judgment data, and Leslie Kaelbling, Erin Bartuska, and Feras Saad for helpful conversations. Tree and car 3D models in figures are from [Lohmüller, 2016]. This research was supported by DARPA (PPAML program, contract number FA8750-14-2-0004), IARPA (under research contract 2015-15061000003), the Office of Naval Research (under research contract N000141310333), the Army Research Office (under agreement number W911NF-13-1-0212), and gifts from Analog Devices and Google. MCT is supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program.


  • Andrieu and Roberts [2009] Christophe Andrieu and Gareth O Roberts. The pseudo-marginal approach for efficient monte carlo computations. The Annals of Statistics, pages 697–725, 2009.
  • Andrieu et al. [2010] Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342, 2010.
  • Aufrère et al. [2003] Romuald Aufrère, Jay Gowdy, Christoph Mertz, Chuck Thorpe, Chieh-Chih Wang, and Teruko Yata. Perception for collision avoidance and autonomous driving. Mechatronics, 13(10):1149–1161, 2003.
  • Baker et al. [2007] Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. Goal inference as inverse planning. In Proceedings of the Cognitive Science Society, volume 29, 2007.
  • Beaumont [2003] Mark A Beaumont. Estimation of population growth or decline in genetically monitored populations. Genetics, 164(3):1139–1160, 2003.
  • Carpenter et al. [2016] Bob Carpenter, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language. Journal of Statistical Software, 20, 2016.
  • Franke et al. [1998] Uwe Franke, Dariu Gavrila, Steffen Gorzig, Frank Lindner, F Puetzold, and Christian Wohler. Autonomous driving goes downtown. IEEE Intelligent Systems and Their Applications, 13(6):40–48, 1998.
  • Goodman et al. [2012] Noah Goodman, Vikash Mansinghka, Daniel M Roy, Keith Bonawitz, and Joshua B Tenenbaum. Church: a language for generative models. arXiv preprint arXiv:1206.3255, 2012.
  • Kingma and Welling [2013] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  • Kullback and Leibler [1951] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
  • Kumar and Michael [2012] Vijay Kumar and Nathan Michael. Opportunities and challenges with autonomous micro aerial vehicles. The International Journal of Robotics Research, 31(11):1279–1291, 2012.
  • LaValle [1998] Steven M LaValle. Rapidly-exploring random trees: A new tool for path planning. 1998.
  • Liao et al. [2006] Lin Liao, Dieter Fox, and Henry Kautz. Location-based activity recognition. 2006.
  • Lohmüller [2016] Friedrich A. Lohmüller. Descriptions and examples for the pov-ray raytracer., 2016. Accessed: 2017-05-17.
  • Mansinghka et al. [2014] Vikash Mansinghka, Daniel Selsam, and Yura Perov. Venture: a higher-order probabilistic programming platform with programmable inference. arXiv preprint arXiv:1404.0099, 2014.
  • Milch et al. [2007] Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L Ong, and Andrey Kolobov. 1 blog: Probabilistic models with unknown objects. Statistical relational learning, page 373, 2007.
  • Morris [2001] Quaid Morris. Recognition networks for approximate inference in bn20 networks. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pages 370–377. Morgan Kaufmann Publishers Inc., 2001.
  • Nielsen and Nock [2014] Frank Nielsen and Richard Nock. On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Processing Letters, 21(1):10–13, 2014.
  • Pfeffer [2009] Avi Pfeffer. Figaro: An object-oriented probabilistic programming language. Charles River Analytics Technical Report, 137, 2009.
  • Stuhlmüller et al. [2013] Andreas Stuhlmüller, Jacob Taylor, and Noah Goodman. Learning stochastic inverses. In Advances in neural information processing systems, pages 3048–3056, 2013.
  • Tran and Davis [2008] Son Tran and Larry Davis. Event modeling and recognition using markov logic networks. Computer vision–ECCV 2008, pages 610–623, 2008.
  • Urmson et al. [2008] Chris Urmson, Joshua Anhalt, Drew Bagnell, Christopher Baker, Robert Bittner, MN Clark, John Dolan, Dave Duggins, Tugrul Galatali, Chris Geyer, et al. Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics, 25(8):425–466, 2008.
  • Wingate et al. [2011] David Wingate, Andreas Stuhlmüller, and Noah D Goodman. Lightweight implementations of probabilistic programming languages via transformational compilation. In AISTATS, pages 770–778, 2011.
  • Ziebart et al. [2009] Brian D Ziebart, Nathan Ratliff, Garratt Gallagher, Christoph Mertz, Kevin Peterson, J Andrew Bagnell, Martial Hebert, Anind K Dey, and Siddhartha Srinivasa. Planning-based prediction for pedestrians. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 3931–3936. IEEE, 2009.

Appendix A Planner Details

We now describe details of the planner omitted from the main text, including the procedures simplify-path, refine-path, and walk-path, which are defined in Algorithm 6. Paths are represented as sequences of points, with lines connecting the points. The path begins with start and ends with goal . To be a valid path with respect to map , no point in the path may lie within an obstacle (polygon) of (i.e. ), and no line between two adjacent path points may intersect an obstacle of (i.e. ).

2:procedure simplify-path(, , , )
3:       Initialize simplified path
5:      for  to  do
6:            if not  then
7:                  Point is needed, keep it
9:            else
10:                  Point is not needed, skip it                   
11:       Add goal to simplified path
13:procedure refine-path(, , , )
14:      for  to  do
16:            for  to  do Iterate over path dims.
18:                  Change path dim.
20:                 if  then
21:                        Accept                                    
23:procedure walk-to(, , )
24:       Path dist. from traveled so far
25:       Desired path distance from
26:      for  to  do
27:             Dist. to next point
28:            if  then
32:       Once reached goal, stay forever
33:procedure walk-path(, , )
34:      for  to  do
Algorithm 6 Additional details of the agent-path model of goal-directed behavior.

Appendix B Additional Experiments

b.1 Jointly Inferring the Belief and Goal of an Agent

The Venture program of Figure 6(a) defines a model in which the belief of an agent about its environment, upon which the agent’s motion plan depends, is uncertain. The environment contains two, static objects (known_objects): a tree and a central divider wall that divides the square into a left and right side. There are passageways between the left and right side that go above and below the divider. However, the agent has knowledge of (or belief in) an additional obstacle wall (obstacle), and the agent plans their path to the destination (goal) taking this additional obstacle into account. Figure 6(a) also shows a Bayesian network representation of this model. We seek to infer both the agent’s goal and the agent’s beliefs about the location, orientation, and size of the obstacle.

We used Cascading Resimulation Metropolis-Hastings (Algorithm 2) with a single repeated transition operator based on an independent joint proposal to goal () and to the unknown parameters of obstacle (start post location, orientation, and length, proposed from the prior). We initialized from the prior. Parameters of the planner agent-path were , , , , , . We ran several independent Markov chains of iterations each, on a synthetic dataset in which the agent takes a path from the right to the left of the map by going below the divider. The final state of four such chains are visualized in Figure 6(b). For this dataset, the goal destination of the agent is revealed with certainty because the agent reaches and stops in the upper left corner. The obstacle inferences indicate that agent believes the upper route to its goal is blocked, because otherwise the agent would have taken the shorter, upper route, to its goal. However, the specific details of how the obstacle blocks the upper passageway remain uncertain.

b.2 Goal Inference in a Driving Scenario

Figure 7 shows an application of the multi-agent common-goal model of Figure 4 to a driving scenario. We show 60 independent replicates of 3000 iterations of Cascading Resimulation Metropolis-Hastings each. The results illustrate that this model can be used with varied environments.

b.3 Real-World Human Motion, Alternate Sequence

We extended the experiment described in Section 4.3 and shown in Figure 4 by running Cascading Resimulation Metropolis-Hastings on an alternate sequence of observed person locations in which the individals diverge to separate individual goal destination. The inferences, shown in Figure 5, confirm the expectations, with all samples indicating is_common_goal = False. Samples were obtained from the final state of independent Markov chains, with initialization from the prior, followed by iterations of Cascading Resimulation Metropolis-Hastings.

Figure 5: Inferring whether or not two people are headed to the same destination, as in Figure 4, but for a different sequence of observed locations. The final frame shows approximate posterior inference samples obtained from cascading resimulation Metropolis-Hastings. Inference gives low probability of a common goal for this sequence (there were no is_common_goal = True samples).
Figure 6: Inferring the belief of an agent about the location and shape of an obstacle in its environment, from observations of the agent’s motion. (a) shows a Venture model of the agent’s belief, goal, and resulting motion and a Bayesian network representation of the model with observed variables shaded. (b) shows approximate posterior samples of goal and obstacle obtained with Cascading Resimulation Metropolis-Hastings, for a data set in which the goal is disambiguated to lie in the upper left corner. The obstacle samples in (b) indicate that inference in the model concluded that the agent believes that there is an obstacle blocking the upper route to its goal. Otherwise, the agent would have taken the shorter, upper route.
Figure 7: A synthetic application of the common-goal inference problem from Figure 4 to a different scenario inspired by autonomous driving. Above shows approximate posterior inference samples obtained with independent runs of Cascading Resimulation MH. Each sample with is_common_goal = True is rendered as a single yellow sphere. Each sample with is_common_goal = False is rendered as two blue and red spheres. Left: Inference indicates an approximate 0.5 probability that the cars are both headed for the center of the map. Right: The red car has stopped, revealing its goal, and the blue car continued, indicating that the two cars do not share the same destination.

b.4 Inference With Waypoint Planner

Figure 8 compares waypoints and paths proposed by Nested Inference MH with a neural nested inference algorithm with waypoints and paths proposed by Cascading Resimulation MH on an illustrative example data set. The poor quality of the prior as the proposal, as used by Cascading Resimulation, results in unecessary rejections, and slow convergence. The neural network proposes waypoints near the bend in the path.

The KL divergence estimates of Figure 3(d) were obtained by binning 960 independent reference samples (30,000 transitions of Cascading Resimulation MH, initialized from the prior) and binning 960 independent approximate inference samples for each inference algorithm evaluated. The world unit square was binned into 25 squares (5-by-5), and a discrete distribution was estimated for each sampler by counting the number of samples falling into each bin, adding a pseudocount of to each bin, and normalizing. The KL divergence from the resulting reference sampler histogram was computed to each resulting approximate inference algorithm histogram. For each inference strategy (Cascading Resimulation MH, Neural Nested Inference MH with , Resimulation Nested Inference MH with and Resimulation Nested Inference MH with ), the number of MH transitions was varied over several orders of magnitude, and the final state in each chain was recorded, to obtain samples for each inference algorithm evaluated. The number of MH transitions used to obtain samples shown in Figures 3(a,b,c) are 10, 1, and 10, respectively. Figure 9 shows additional samples comparing Nested Inference Metropolis-Hastings with a neural nested inference algorithm with Cascading Resimulation Metropolis-Hastings.

Figure 8: Proposed waypoints and paths produced by Nested Inference MH with a neural nested inference algorithm, and Cascading Resimulation MH, when evaluating the MH acceptance ratio for a proposed goal in the center of the enclosure that is the ground truth goal. Because the neural nested inference algorithm generates reasonable proposed waypoints, the proposed paths have a high probability of being consistent with the observed data. Because Cascading Resimulation proposes the waypoint and path from the prior for each proposed goal , the paths proposed are unlikely to be consistent with the observations, resulting in a high MH rejection rate, even when the proposed goal is the ground truth goal, as is the case here.
Figure 9: Additional comparisons of approximate posterior goal inferences using neural Nested Inference Metropolis-Hastings (NNI, top row) and Cascading Resimulation Metropolis-Hastings (CR, bottom row). The drone starts in the lower left corner (orange). Observations of the drone’s location are shown in white. Red dots are independent approximate posterior samples of the drone’s goal. The neural Nested Inference MH strategy converges to a qualitatively correct distribution within 100-300ms (indicating real-time performance), whereas Cascading Resimulation MH requires 10-30 seconds to produce similarly accurate inferences (also see Figure 3(d)).

Appendix C Nested Inference Derivations

The variance of the likelihood estimate with is:

The bias of the log likelihood estimate with is: