1 Introduction
Intelligent systems sometimes need to infer the probable goals of people, cars, and robots, based on partial observations of their motion. These problems are central to autonomous driving and driver assistance [Franke et al., 1998; Urmson et al., 2008; Aufrère et al., 2003], but also arise in aerial robotics, reconnaissance, and security applications [Kumar and Michael, 2012; Liao et al., 2006; Tran and Davis, 2008]. In these settings, knowledge of the beliefs and goals of an agent makes it possible to infer their probable future actions.
Because the mental state of another agent is inherently unobservable and uncertain, it is natural to take a Bayesian approach to inferring it. Probabilistic models can be used to describe how an agent’s latent highlevel goals and beliefs about the environment interact to yield its probable actions. Most existing work along these lines has focused on modeling goaldirected behavior using Markov decision processes and related approaches from stochastic control
[Baker et al., 2007; Ziebart et al., 2009]. While promising, these approaches involve significant taskspecific engineering. They also calculate policies that prescribe actions for every possible state of the world, sometimes in the inner loop of an inference algorithm. This leads to fundamental scaling challenges, even for simple environments and goal priors.This paper introduces a class of probabilistic programs that formulate goal inference problems as approximate inference in generative models of goaldirected behavior. The proposed approach reflects three contributions: First, agents are assumed to follow paths generated by fast randomized path planning code that can incorporate heuristics drawn from video game engines and robotics. This can scale to larger environments than approaches based on optimal control. Second, hierarchical models for goals and paths are represented as probabilistic programs. This allows one to formulate a broad class of single and multiagent problems with common modeling and inference machinery. Ordinary probabilistic programming constructs can handle complex maps, hierarchical goal priors, and partially observed environments. Third, this paper proposes an approach to realtime approximate inference, using neural networks to learn proposals for the internal choices made by any path planners. Together, these contributions lead to a practical proposal for goal inference that has the potential to scale to a broad class of realworld problems and realtime applications. We demonstrate the efficacy of prototype implementations of these algorithms on three simple examples, each written in under 50 lines of probabilistic code.
Note that this proposal does not require planning algorithms to be rewritten as probabilistic programs, but instead allows optimized, lowlevel, or legacy planning codes to be treated as black boxes. This avoids the implementation and performance cost of rewriting an existing path planner in a highlevel probabilistic programming language, and exposing the thousands of random choices it might make to generic inference algorithms. One difficulty is that such optimized blackbox planners may well make too many internal random choices to have tractable inputoutput likelihoods. This paper proposes two novel Monte Carlo techniques for these “likelihoodfree” models, each extending MetropolisHastings: (i) a cascading resimulation algorithm that makes joint proposals to ensure cancellation of the unknown likelihoods, and (ii) a nested inference algorithm that uses estimated likelihoods derived from inference over the internal random choices of the planner. Cascading resimulation is simple to implement, but nested inference enables use of a broad class of Monte Carlo, variational, and neural network mechanisms to handle the intractable likelihoods.
2 Modeling GoalDirected Behavior Using Randomized Path Planners
This paper defines probabilistic models of goaldirected behavior using randomized pathplanning algorithms. Algorithm 1 describes one such planner, called agentpath. This planner can be applied to a broad class of environments with complex obstacles. The planner assumes a bounded twodimensional space (e.g., the square ) and a world map that is a set of polygonal obstacles. The planner takes as input a start location , a goal location , the map , and a sequence of time points , and returns either a sequence of locations on a path from to at each time , or ‘nopathfound’. The planner operates by growing a rapidlyexploring random tree (RRT) [LaValle, 1998] from the start location to fill the space, searching for a clear line of sight between the tree and the goal. If a path is found, it is then refined to minimize its length using local optimization. Finally, the agent walks the path at a constant speed, producing the output locations . See Appendix A for more details.
Many variations of this planner are possible, including versions that take into account costs other than path length, and spaces encoding configurations other than geographic position (e.g., configuration spaces of an articulated robot). The planner parameters and trade off the cost of planning with the (probable) optimality of the paths (see Figure 1). Figure 2 and Figure 4 show this planner being used as a modeling primitive in the Venture probabilistic programming platform [Mansinghka et al., 2014]. The planner was implemented in C and imported as a foreign modeling primitive into Venture. Venture supports likelihoodfree primitives and design of custom inference strategies, including those of Section 3.
3 Inference in Probabilistic Programs With LikelihoodFree Primitives
The path planner agentpath of Algorithm 1 can be used in a probabilistic program either by implementing the planner in a probabilistic programming language, or by treating the planner as a primitive random choice. We treat the planner as a random choice, as this allows use of an optimized C implementation of the planner. However, probabilistic programming languages such as Church, Stan, BLOG, and Figaro all require random choices to have tractable marginal likelihoods [Goodman et al., 2012; Carpenter et al., 2016; Milch et al., 2007; Pfeffer, 2009]. Computing the marginal likelihood of agentpath for outputs and inputs , , , and would involve an intractable intregral over the (thousands of) internal random choices made in agentpath.
This section introduces two Monte Carlo strategies for inference in probabilistic programs that include random choices with intractable marginal likelihoods, referred to as “likelihoodfree” primitives. The first strategy, shown in Algorithm 2, is called Cascading Resimulation MetropolisHastings; it makes block proposals to likelihoodfree random choices, exploiting cancellation of the unknown likelihoods. The second, shown in Algorithm 3, is called Nested Inference MetropolisHastings; it uses Monte Carlo estimates of the unknown likelihoods in place of the likelihoods themselves. Although simple techniques like likelihoodweighting can also be used in the presence of likelihoodfree primitives, they tend to work well only when a global proposal that is wellmatched to the posterior is available. The algorithms we introduce do not have this limitation.
We first introduce notation. Let be the set of primitive random choices available to a probabilistic program (e.g. ). For , let denote the set of valid arguments for the primitive, let denote the set of possible outputs, and let denote the marginal likelihood of output given arguments , where for all . We do not require evaluation of to be computationally tractable.
Following Wingate et al. [2011], for a probabilistic program , we assume there is a name assigned to every possible random choice, for some countable . We assume that distinct random choices are assigned unique names within every execution of . The set of names used in an execution is some finite set . We require that all random choices with name are of the same type . Each unique completed execution of can therefore be represented as the finite set of names of those random choices made in the execution, together with the result values. We denote these results , and denote this complete package . The tuple is called an execution trace of the probabilistic program .
This paper focuses on probabilistic programs where is the same for all executions — that is, the set of random choices made is not affected by any of those choices. Relaxations of this are left for future work; more general formalizations of probabilistic programs can be found in [Wingate et al., 2011; Mansinghka et al., 2014].
We consider random choice to depend on random choice if changing the result of can lead to a change in the inputs of , even if all other results are held fixed. We assume that it is possible to construct a directed acyclic dependency graph among random choices , where an edge exists if and only if random choice depends on random choice in the above sense. The parents of a random choice are denoted by . The arguments of each random choice are then a (deterministic) function of the results of random choices in , which are denoted ; we write . Let denote the ‘children’ of choice . Also, let denote the random choices with intractable likelihoods (the “likelihoodfree” choices). Let denote the random choices that are constrained based on data, which must have tractable likelihoods. Let denote the values we are constraining those random choices to. The joint probability density of an execution trace is:
(1) 
where we have omitted the dependence on because it is the same for all executions.
3.1 Cascading Resimulation MetropolisHastings
How can a probabilistic program cope with complex, likelihoodfree primitives? Our core insight is that if the proposal distribution for a random choice is equal to the prior , then the likelihoods will cancel in a MetropolisHastings (MH) acceptance ratio and therefore do not need to be explicitly computed. Sampling from the prior is achieved simply by simulating the random choice. A (prototypical) acceptance ratio looks like this:
We use blocked proposals in which a change to a likelihoodfree choice is proposed from the prior whenever a proposal is made to any of its parents. A likelihoodfree choice that is proposed may itself have likelihoodfree choices as children, in which case these children are also proposed, generating a cascade of proposals. Algorithm 2 shows the Cascading Resimulation MH transition operator, which extends an initial custom proposal to random choice (which must not be likelihoodfree) to also include any likelihoodfree random choices in the cascade, such that the intractable likelihoods cancel.
Algorithm 2 is a MetropolisHastings transition over the random choices with target density equal to the local posterior , and with proposal density:
(2) 
The MetropolisHastings acceptance ratio is:
(3) 
We illustrate Cascading Resimulation MH in Figure 2, on the task of inferring the goal of a simulated drone in an observed environment.
3.2 Nested Inference MetropolisHastings
In some problems, Cascading Resimulation MH will generate many expensive simulations of likelihoodfree choices, most of which will be rejected. For these problems, and for realtime applications, we propose an alternative MetropolisHastings algorithm, called Nested Inference MH, that uses Monte Carlo estimates of the intractable likelihoods in the acceptance ratio. The likelihood estimates are obtained using auxiliary “nested inference” algorithms, which sample probable values for the internal random choices made by a likelihoodfree choice (e.g. a randomized planning algorithm) given its inputs and outputs, and calculate a weight that can be used to form an importance sampling estimate of the unknown likelihood.
Nested Inference MH is based on an interpretation of likelihoodfree random choices like agentpath as probabilistic programs in their own right. Let be an execution trace of a likelihoodfree random choice of type . We denote the joint density on execution traces and return values of the random choice, given input arguments , by . The marginal likelihood of the random choice is given by the (intractable) integral . We denote the conditional trace density for arguments and output by .
Nested inference assumes the existence of a nested inference algorithm that samples execution traces according to some density that approximates the conditional density on traces of the likelihoodfree choice, i.e., . We require that for all where . Using the nested inference algorithm as an importance sampler, we produce an unbiased importance sampling estimate of the random choice’s intractable likelihood for arguments and output by sampling times from the inference algorithm, as follows:
(4) 
Nested inference also assumes that the ratio can be evaluated. While in principle the nested inference algorithm can be produced by recoding the likelihoodfree primitive in a highlevel probabilistic programming language, this is by no means required, nor do we expect it to be the common case. In this paper, we focus on nested inference algorithms that use learned neural networks.
The accuracy of the likelihood estimate is determined by the accuracy of the nested inference algorithm. Specifically, for
the variance of the estimate is:
(5) 
where denotes the chisquare divergence [Nielsen and Nock, 2014], and where and on the righthand side represent density functions over , not specific density values. Similarly, we can view for as a (biased) estimator of , where the bias is:
(6) 
where denotes the KullbackLeibler (KL) divergence [Kullback and Leibler, 1951].
3.2.1 Nested Inference MetropolisHastings
Algorithm 3 describes a Nested Inference MH transition in which a custom proposal is made to a likelihoodfree random choice that uses estimated likelihoods produced using a nested inference algorithm. It assumes that all children of also have nested inference algorithms themselves. Heterogeneous configurations are also possible.
Although this transition uses Monte Carlo estimates of likelihoods in the acceptance ratio, it is a standard MetropolisHastings transition on an extended state space that includes the result of the proposedto random choice , traces of the proposedto random choice, and traces of each child of the proposedto random choice. The target density on the extended space is:
(7) 
The proposal density on the extended space is:
(8) 
The values of other random choices are constant. See Appendix C for derivation. The marginal density of in the extended target density is the local posterior for the result of random choice given the values of all other random choices. Singlesite Nested Inference MH transitions that propose to different random choices but use the same database of nestedinference likelihood estimates
can be composed to form Markov chains that converge to the posterior
.Our use of unbiased likelihood estimates in place of the true likelihoods when computing the MetropolisHastings acceptance ratio in Algorithm 3 is closely related to pseudomarginal MCMC [Andrieu and Roberts, 2009] and particle MCMC [Andrieu et al., 2010]. Indeed, each singlesite Nested Inference MH transition can be seen as a compositional variant of a ‘grouped independence MH’ transition [Beaumont, 2003] in which several pseudomarginal likelihoods (one for each random choice ) are used in the same update. The database of nestedinference likelihood estimates stores the ‘recycled’ pseudomarginal likelihood estimates from previous transitions.
The convergence rate of a Markov chain based on Nested Inference MH transition operators depends on the accuracy of the nested inference algorithm and . In the limit of exact nested inference algorithm () the likelihood estimates are exact, and the algorithm is identical to standard MetropolisHastings. If the nested inference algorithm is very inaccurate, it may routinely propose traces that are incompatible with the output of the random choice, resulting in low acceptance rates. Better characterizing how the convergence rate depends on the accuracy of the nested inference algorithms and on is an important area for future work.
3.2.2 Learning a nested inference algorithm
It is possible to learn a nested inference algorithm that approximates . The idea of learned inference for probabilistic generative models goes back at least to Morris [2001] and has also been used in Stuhlmüller et al. [2013] and Kingma and Welling [2013]. We apply this idea to nested inference as follows. Let denote a nested inference algorithm that is parameterized by — for example, might be the weights of a neural network used as part of the inference algorithm. We establish a training distribution over the arguments to the primitive , and approximately solve the following optimization problem:
The goal is for to approximate well (i.e., have small KL divergence) for typical input arguments . We approximate this objective function by drawing independent sets of input arguments from the training distribution, and running a traced execution of the likelihoodfree random choice (e.g. planner) on each set of arguments, recording^{1}^{1}1This training regime cannot be applied to a true blackbox path planner, since a recording of its internal randomness is now necessary. However, such recordings can be produced from a straightforwardly instrumented version of the algorithm. The likelihood estimator for the planner can still be treated as a blackbox by the Nested Inference MH transition. the trace and output :
We use the resulting dataset to define an approximate objective function
that is an unbiased estimate of the original objective function:
(9)  
(10) 
where does not depend on . Note that minimizing over is equivalent to maximizing the loglikelihood of the data . Because we use forward simulations to produce jointly from , we have one exact conditional sample for each training example.
4 Example Applications
We have implemented four example applications, designed to illustrate the flexibility of our framework:

Inferring the probable goal of a simulated drone. This example shows that small changes to the environment, such as including an additional doorway, can yield large changes in the inferred goals.

Inferring the probable goal of a simulated drone with a more complex planner. Specifically, we model the drone as following a multipart path produced by a planner that first chooses a waypoint uniformly at random and then recursively solves the two path planning problems induced by the choice of waypoint. This example shows (a) applicability of the framework to more complex models of goaldirected behavior, and (b) that Nested Inference MH with a learned neural network can outperform Cascading Resimulation MH.

Inferring whether or not two people walking around tables in a room are headed for the same goal or different goals. This example demonstrates applicability to simple hierarchical models for goals and also demonstrates applicability to realworld (as opposed to synthetic) data.

Jointly inferring a simulated agent’s goals and its beliefs about an obstacle in the map whose location, size, and orientation is unknown to the probabilistic program. This example is described in the appendix due to space constraints.
4.1 Example 1: Sensitivity of Goal Inference to Small Map Changes
Figure 2 shows a comparison of goal inference in two different maps given the same observations. The map for the scenario on the left has an enclosure with two openings, one on the top and one on the bottom, while the map for the scenario on the right has a single opening. In the map on the left, the inferred goal samples fall outside the enclosure, because if the drone intended to go inside the enclosure, it could have taken a much shorter path. In the map on the right, a significant fraction of goal samples fall inside the enclosure, as relatively efficient paths into the enclosure go through the partial trajectory that has been observed so far. Samples shown are the final states of 480 independent replicates of a Markov chain initialized from the prior, with Cascading Resimulation MH transitions (Algorithm 2) using the prior as the proposal. Planner parameters are , , , , , .
4.2 Example 2: Handling Path Planners With Waypoints via Nested Inference
Next, we used a model where the agent may choose a waypoint and separately plan a path to the waypoint and a path from the waypoint to the goal (agentwaypointpath, Algorithm 4). Unlike the simpler agentpath model, which typically samples from a small number of modes concentrated at efficient routes from the start to the goal, agentwaypointpath yields paths that are unpredictable without knowledge of the waypoint. Parameters and of planpath are omitted for simplicity. We consider the same goal inference task as in Example 1 but with the alternative planner. Cascading Resimulation MH performs poorly on this task, because the prior is a poor proposal for the internal random choices of agentwaypointpath.
Algorithm 5 shows a nested inference algorithm for agentwaypointpath that uses a neural network to propose the waypoint () and whether the waypoint is used (), given the goal and observations, and then executes the rest of the planner, conditioned on and . The network was trained on 10,000 runs of agentwaypointpath with random goal input and fixed world map and start . The nested inference algorithm splits the trace of agentwaypointpath into and (the random choices made within executions of planpath), so that . The density of the nested inference algorithm is then , and the density ratio , which is used by Nested Inference MH when estimating the planner likelihoods, simplifies to . To evaluate this ratio, we separately evaluate the density of addnoise and the density of the neural network’s stochastic outputs.
We compared three strategies for goal inference: Nested Inference MH using Algorithm 5 and , Cascading Resimulation MH, and Nested Inference MH using a “resimulation” nested inference algorithm and . Figure 3 shows that neural Nested Inference MH converges faster than the other strategies. Planner parameters were the same as in Example 1. All inference strategies were implemented using a custom Python inference library. Integration of Nested Inference MH with Venture is left for future work.
4.3 Example 3: Modeling RealWorld Human Motion
The Venture program of Figure 4(c) defines a model with two agents whose destinations may or may not be the same. The environment (world) and the start locations of the agents are known. The is_common_goal flag determines whether the agents share the same goal destination. The paths of both agents are modeled using agentpath. The corresponding Bayesian network is shown in Figure 4(e). We collected video of two collaborators walking in a scene containing tables, for two conditions—one in which the they meet at a common location, and one where they diverge. For the commongoal condition we constructed short and extended sequences of observed locations (Figure 4(a) and (b)). We used Cascading Resimulation MH for inference, initialized from the prior, with a joint prior proposal over all latent variables. We ran 60 chains of transitions each, and rendered the final states in Figures 4(ab). The speed for each individual was set to their average speed along the observed path. The estimated probabilities of is_common_goal=True for the short and extended sequences are 0.63 and 0.82 respectively. This trend qualitatively matches human judgments, shown in Figure 4(d) (the model was not calibrated to match human judgments). See Appendix B for additional results.
5 Discussion
This paper introduced a class of probabilistic programs for formulating goal inference as approximate inference in probabilistic generative models of goaldirected behavior. The technical contributions are (i) a probabilistic programming formulation that makes complex goal and map priors easy to specify; (ii) the use of randomized path planning algorithms as the backbone of generative models; and (iii) the introduction of Monte Carlo techniques that can handle the intractable likelihoods of these path planners. The experiments showed that it is possible for short probabilistic programs to make meaningful inferences about goaldirected behavior.
From the standpoint of robotics, autonomous driving, or reconnaissance, the examples in this paper are quite preliminary. More experiments are needed to explore the accuracy of approximate inference in these models, as well as the accuracy of the models themselves, especially on realworld problems. The probabilistic programming formulation makes it easy to explore variations of models, environments, and inference strategies.
The problem of inferring the mental states of autonomous agents is central to probabilistic artificial intelligence. It may also be a natural application for structured generative models and for probabilistic programming, but only if sufficiently fast and flexible inference schemes can be developed. We hope this paper helps to encourage the use of probabilistic programming for building intelligent software that can draw meaningful inferences about goaldirected behavior.
Acknowledgements
The authors would like to thank Feras Saad for obtaining human judgment data, and Leslie Kaelbling, Erin Bartuska, and Feras Saad for helpful conversations. Tree and car 3D models in figures are from http://www.flohmueller.de/ [Lohmüller, 2016]. This research was supported by DARPA (PPAML program, contract number FA87501420004), IARPA (under research contract 201515061000003), the Office of Naval Research (under research contract N000141310333), the Army Research Office (under agreement number W911NF1310212), and gifts from Analog Devices and Google. MCT is supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program.
References
 Andrieu and Roberts [2009] Christophe Andrieu and Gareth O Roberts. The pseudomarginal approach for efficient monte carlo computations. The Annals of Statistics, pages 697–725, 2009.
 Andrieu et al. [2010] Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342, 2010.
 Aufrère et al. [2003] Romuald Aufrère, Jay Gowdy, Christoph Mertz, Chuck Thorpe, ChiehChih Wang, and Teruko Yata. Perception for collision avoidance and autonomous driving. Mechatronics, 13(10):1149–1161, 2003.
 Baker et al. [2007] Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. Goal inference as inverse planning. In Proceedings of the Cognitive Science Society, volume 29, 2007.
 Beaumont [2003] Mark A Beaumont. Estimation of population growth or decline in genetically monitored populations. Genetics, 164(3):1139–1160, 2003.
 Carpenter et al. [2016] Bob Carpenter, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language. Journal of Statistical Software, 20, 2016.
 Franke et al. [1998] Uwe Franke, Dariu Gavrila, Steffen Gorzig, Frank Lindner, F Puetzold, and Christian Wohler. Autonomous driving goes downtown. IEEE Intelligent Systems and Their Applications, 13(6):40–48, 1998.
 Goodman et al. [2012] Noah Goodman, Vikash Mansinghka, Daniel M Roy, Keith Bonawitz, and Joshua B Tenenbaum. Church: a language for generative models. arXiv preprint arXiv:1206.3255, 2012.
 Kingma and Welling [2013] Diederik P Kingma and Max Welling. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 Kullback and Leibler [1951] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
 Kumar and Michael [2012] Vijay Kumar and Nathan Michael. Opportunities and challenges with autonomous micro aerial vehicles. The International Journal of Robotics Research, 31(11):1279–1291, 2012.
 LaValle [1998] Steven M LaValle. Rapidlyexploring random trees: A new tool for path planning. 1998.
 Liao et al. [2006] Lin Liao, Dieter Fox, and Henry Kautz. Locationbased activity recognition. 2006.
 Lohmüller [2016] Friedrich A. Lohmüller. Descriptions and examples for the povray raytracer. http://www.flohmueller.de/pov_tut/objects/obj_000e.htm, 2016. Accessed: 20170517.
 Mansinghka et al. [2014] Vikash Mansinghka, Daniel Selsam, and Yura Perov. Venture: a higherorder probabilistic programming platform with programmable inference. arXiv preprint arXiv:1404.0099, 2014.
 Milch et al. [2007] Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L Ong, and Andrey Kolobov. 1 blog: Probabilistic models with unknown objects. Statistical relational learning, page 373, 2007.
 Morris [2001] Quaid Morris. Recognition networks for approximate inference in bn20 networks. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pages 370–377. Morgan Kaufmann Publishers Inc., 2001.
 Nielsen and Nock [2014] Frank Nielsen and Richard Nock. On the chi square and higherorder chi distances for approximating fdivergences. IEEE Signal Processing Letters, 21(1):10–13, 2014.
 Pfeffer [2009] Avi Pfeffer. Figaro: An objectoriented probabilistic programming language. Charles River Analytics Technical Report, 137, 2009.
 Stuhlmüller et al. [2013] Andreas Stuhlmüller, Jacob Taylor, and Noah Goodman. Learning stochastic inverses. In Advances in neural information processing systems, pages 3048–3056, 2013.
 Tran and Davis [2008] Son Tran and Larry Davis. Event modeling and recognition using markov logic networks. Computer vision–ECCV 2008, pages 610–623, 2008.
 Urmson et al. [2008] Chris Urmson, Joshua Anhalt, Drew Bagnell, Christopher Baker, Robert Bittner, MN Clark, John Dolan, Dave Duggins, Tugrul Galatali, Chris Geyer, et al. Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics, 25(8):425–466, 2008.
 Wingate et al. [2011] David Wingate, Andreas Stuhlmüller, and Noah D Goodman. Lightweight implementations of probabilistic programming languages via transformational compilation. In AISTATS, pages 770–778, 2011.
 Ziebart et al. [2009] Brian D Ziebart, Nathan Ratliff, Garratt Gallagher, Christoph Mertz, Kevin Peterson, J Andrew Bagnell, Martial Hebert, Anind K Dey, and Siddhartha Srinivasa. Planningbased prediction for pedestrians. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 3931–3936. IEEE, 2009.
Appendix A Planner Details
We now describe details of the planner omitted from the main text, including the procedures simplifypath, refinepath, and walkpath, which are defined in Algorithm 6. Paths are represented as sequences of points, with lines connecting the points. The path begins with start and ends with goal . To be a valid path with respect to map , no point in the path may lie within an obstacle (polygon) of (i.e. ), and no line between two adjacent path points may intersect an obstacle of (i.e. ).
Appendix B Additional Experiments
b.1 Jointly Inferring the Belief and Goal of an Agent
The Venture program of Figure 6(a) defines a model in which the belief of an agent about its environment, upon which the agent’s motion plan depends, is uncertain. The environment contains two, static objects (known_objects): a tree and a central divider wall that divides the square into a left and right side. There are passageways between the left and right side that go above and below the divider. However, the agent has knowledge of (or belief in) an additional obstacle wall (obstacle), and the agent plans their path to the destination (goal) taking this additional obstacle into account. Figure 6(a) also shows a Bayesian network representation of this model. We seek to infer both the agent’s goal and the agent’s beliefs about the location, orientation, and size of the obstacle.
We used Cascading Resimulation MetropolisHastings (Algorithm 2) with a single repeated transition operator based on an independent joint proposal to goal () and to the unknown parameters of obstacle (start post location, orientation, and length, proposed from the prior). We initialized from the prior. Parameters of the planner agentpath were , , , , , . We ran several independent Markov chains of iterations each, on a synthetic dataset in which the agent takes a path from the right to the left of the map by going below the divider. The final state of four such chains are visualized in Figure 6(b). For this dataset, the goal destination of the agent is revealed with certainty because the agent reaches and stops in the upper left corner. The obstacle inferences indicate that agent believes the upper route to its goal is blocked, because otherwise the agent would have taken the shorter, upper route, to its goal. However, the specific details of how the obstacle blocks the upper passageway remain uncertain.
b.2 Goal Inference in a Driving Scenario
b.3 RealWorld Human Motion, Alternate Sequence
We extended the experiment described in Section 4.3 and shown in Figure 4 by running Cascading Resimulation MetropolisHastings on an alternate sequence of observed person locations in which the individals diverge to separate individual goal destination. The inferences, shown in Figure 5, confirm the expectations, with all samples indicating is_common_goal = False. Samples were obtained from the final state of independent Markov chains, with initialization from the prior, followed by iterations of Cascading Resimulation MetropolisHastings.
b.4 Inference With Waypoint Planner
Figure 8 compares waypoints and paths proposed by Nested Inference MH with a neural nested inference algorithm with waypoints and paths proposed by Cascading Resimulation MH on an illustrative example data set. The poor quality of the prior as the proposal, as used by Cascading Resimulation, results in unecessary rejections, and slow convergence. The neural network proposes waypoints near the bend in the path.
The KL divergence estimates of Figure 3(d) were obtained by binning 960 independent reference samples (30,000 transitions of Cascading Resimulation MH, initialized from the prior) and binning 960 independent approximate inference samples for each inference algorithm evaluated. The world unit square was binned into 25 squares (5by5), and a discrete distribution was estimated for each sampler by counting the number of samples falling into each bin, adding a pseudocount of to each bin, and normalizing. The KL divergence from the resulting reference sampler histogram was computed to each resulting approximate inference algorithm histogram. For each inference strategy (Cascading Resimulation MH, Neural Nested Inference MH with , Resimulation Nested Inference MH with and Resimulation Nested Inference MH with ), the number of MH transitions was varied over several orders of magnitude, and the final state in each chain was recorded, to obtain samples for each inference algorithm evaluated. The number of MH transitions used to obtain samples shown in Figures 3(a,b,c) are 10, 1, and 10, respectively. Figure 9 shows additional samples comparing Nested Inference MetropolisHastings with a neural nested inference algorithm with Cascading Resimulation MetropolisHastings.
Appendix C Nested Inference Derivations
The variance of the likelihood estimate with is:
The bias of the log likelihood estimate with is: