Planning around humans is critical for many real-world robotic applications. To effectively reason about human motion, practitioners often leverage predictive models of human motion during robot decision making. These models can be objective-driven where the human seeks to maximize an objective [bai2015intention, baker2007goal, ng2000algorithms, ziebart2009planning, kitani2012activity] or pattern-based where the behavioral structure is learned from data and then used for planning around human arms [amor2014interaction, ding2011human, koppula2013anticipating, lasota2015analyzing, hawkins2013probabilistic], drivers [hawkins2013probabilistic, Schmerling2017, driggs2018robust], and pedestrians [Ma_2017_CVPR, alahi2016social] (see [rudenko2019human] for a detailed survey). Such predictive models often encode the relevant structure in human decision-making as model parameters. However, since modeling the true human behavior and how people make decisions a priori is challenging, predictors maintain a belief distribution over these model parameters [fisac2018general, lasota2017multiple]. This enables the predictor to use observations of human behavior during runtime to update the belief over the model parameters and probabilistically predict the human motion in accordance with the observations.
Unfortunately, generating probabilistic predictions over future human states is typically computationally demanding, made only more demanding if the human behavior may change over time. For example, a human driver might change how aggressive their driving style is over time, which needs to be accounted for during the predictions. Consequently, the predictor must now reason about all possible ways in which the model parameters and the corresponding predicted future human actions may evolve, given the current belief. In practice, however, computing the full probabilistic predictions may not necessarily be required for planning. Instead, identifying a set of states that the human will occupy with sufficiently high probability can enable the robot to avoid collisions quickly and effectively using a variety of motion planning techniques that use deterministic obstacles [yoder2016autonomous, koenig2005fast, bekris2007greedy, janson2018safe].
In this work, we formalize the human prediction problem as a stochastic reachability problem [prandini2006stochastic, abate2008probabilistic, bujorianu2007new, hu2000towards] in the joint state space of the human and the belief over the predictive model parameters. Using this formulation, the full state distribution can be computed via a stochastic forward reachable set [vinod2018stochastic, abate2007computational], which is the set of all states and beliefs that are reachable under the human policy with at least a given probability threshold. However, instead of computing the stochastic reachable set, we present a novel Hamilton-Jacobi (HJ) reachability-based framework for computing a deterministic approximation of the set of probable states. We do so by restricting the human policy to a deterministic set of allowable actions. This not only allows us to predict likely human states with a significantly lower computational complexity, but it also gives rise to a general framework wherein different definitions of allowable human actions can be instantiated to (a) generate predictions and (b) perform predictor analysis.
In particular, for prediction, the allowable human actions are those that have high enough probability with respect to any given state and belief. Since we do not explicitly maintain the state probabilities during prediction, on one hand our framework may lead to conservative predictions. On the other, our predictions are not too sensitive to these probabilities and are robust to misspecified belief distributions and model parameters (see Figure 1).
Our framework additionally enables predictor analysis. For example, a robot motion planner might need to know how long it will take the predictor to correct a wrong prior given a hypothesized ground truth value of the human model parameters, or to disambiguate between competing hypotheses. We can readily answer such questions by instantiating the allowable human actions (which we use for computing our reachable sets) to model specific ground truth parameters’ values. For instance, when the model parameter represents potential human goal locations, we can compute how long it would take a predictor to realize that the human is going towards a specific goal in the scene. If we do this for two goals in the scene and take the maximum of those times, the robot now also knows that no matter which of the two goals the human is actually pursuing, the identity of the correct goal will become clear within that amount of time.
Our framework can also be thought of as a marriage between techniques from the belief space planning literature, and optimal control methods. For example, POMDPs are often recast as belief space MDPs, which in turn often admit more computationally efficient solution methods [kurniawati2008sarsop, roy2005finding, pineau2003point]. Similarly to how POMDPs can be recast as equivalent belief space MDPs, we also operate in belief space. But rather than doing both the planning and prediction in the belief space, we only do human motion prediction in the belief space, separate from planning, as is often done in the literature to reduce the computational complexity (e.g. [littman1995learning]). However, we use theoretically-rich continuous-time reachability analysis [mitchell2005time, abate2008probabilistic] for prediction, which allows us to leverage a variety of tools [mitchell2004toolbox, chen2013flow] developed to solve such problems grounded within the optimal control literature. To summarize, our key contributions are:
a Hamilton-Jacobi reachability-based framework for prediction generation and predictor analysis;
demonstration of our approach in simulation and on a hardware testbed for safe navigation around humans.
Ii Problem Statement
We study the problem of safe motion planning for a mobile robot in the presence of a single human agent. In particular, our goal is to compute a control sequence for the robot which moves it from a given start state to a goal state without colliding with the human or the static obstacles in the environment. We assume that both the vehicle and human states can be accurately sensed at all times. Finally, we also assume that a map of the static parts of the environment is known; however, the future states of the humans are not known and need to be predicted online. Consequently, we divide the safe planning problem into two subproblems: human motion prediction and robot trajectory planning.
Ii-a Human Motion Prediction with Online Updates
To predict future human states, we model the human as a dynamical system with state , control , and dynamics
Here, could represent the position and velocity of the human, and describes the change in their evolution over time. To find the likely future states of the human, we couple this dynamics model with a model of how the human chooses their actions. In general, this is a particularly difficult modeling challenge and many models exist in the literature (see [rudenko2019human]). In this work, we primarily consider stochastic control policies that are parameterized by :
In this model, can represent many different aspects of human decision-making222This formulation is easily amenable to deterministic policies where is the Dirac delta function., from how passive or aggressive a person is [sadigh2016information] to the kind of visual cues they pay attention to in a scene [kitani2012activity]. The specific choice of parameterization is often highly problem specific and can be hand-designed or learned from prior data [ziebart2009planning, finn2016guided]. Nevertheless, the true value of
is most often not known beforehand and instead needs to be estimated after receiving measurements of the true human behavior. Thus, at any time, we maintain a distribution over , which allows us to reason about the uncertainty in human behavior online based on the measurements of .
We now introduce a running example for illustration purposes throughout the paper.
In this example, we consider a ground vehicle that needs to go to a goal position in a room where a person is walking.
We consider a planar model with state , control , and dynamics .
The model parameter can take two values and indicates which goal location the human is trying to navigate to.
The human policy for any state and goal is given by a Gaussian with mean pointing in the goal direction and a variance representing uncertainty in the human action:
can take two values and indicates which goal location the human is trying to navigate to. The human policy for any state and goal is given by a Gaussian with mean pointing in the goal direction and a variance representing uncertainty in the human action:
where and for . represents the position of goal .
Since we are uncertain about the true value of , we update online based on the measurements of . This observed control may be used as evidence to update the robot’s prior belief about over time via a Bayesian update to obtain the posterior belief:
where is the desired safety threshold and is a design parameter. When , we drop the subscript in . Using this set of likely human states, our robot will generate a trajectory that at each future time step avoids .
Note that the requirement to compute is subtly different from computing the full state distribution, . For computing the full distribution, one can explicitly integrate over all possible future values of , state, and action trajectories. Alternatively, one can use the belief space to keep track of over time, and compute the human state distribution using the belief. The latter computation can be thought of as branching on future observations, and keeping track of what the belief might be at each future time depending on that observation history and the intrinsic changes in the human behavior. Our insight is that this latter computation can be formulated as a stochastic reachability problem in the joint state space of the human and belief, but that it can be simplified to a deterministic reachability problem with lower computational complexity when we only need to (approximately) compute the set .
Ii-B Robot Motion Planning
We model the robot as a dynamical system with state , control , and dynamics . The robot’s goal is to determine a set of controls such that it does not collide with the human or the (known) static obstacles, and reaches its goal by . In this work, we solve this planning problem in a receding horizon fashion. Since the future states of the human are not known a priori, we instead plan the robot trajectory to avoid , the likely states of the human in the time interval , during planning at time .
Running example: Our ground robot is modeled as a 3D Dubins’ car with state given by its position and heading , and speed and angular speed as the control . The respective dynamics are described by . At any given time , we use a third-order spline planner to compute the robot trajectory for a horizon of (for more details, see [bansal2019combining]).
Iii Reachability-based Motion Prediction
In this section, we first discuss how to cast the full probabilistic human motion prediction problem as a stochastic reachability problem. Next, we discuss how we can obtain a deterministic approximation of this stochastic reachability problem, and solve it using our HJ reachability framework.
Iii-a Casting prediction as a reachability problem
We cast the problem of predicting a probability distribution over the human states at some future time as one of maintaining a time-dependent distribution over reachable states and beliefs, given a stochastic human model as in (2). Let the current time be with the current (known) human state and a belief . Different control actions that the human might take next will induce a change in both the human’s physical state and the robot’s belief over . This in turn affects what human action distribution the robot will predict for the following timestep, and so on. To simultaneously compute all possible future beliefs over and corresponding likely human states, we consider the joint dynamics of and the human:
At any state , the distribution over the (predicted) human actions is given by
To derive the dynamics of in (6), we note that the belief can change either due to the the new observations (via the Bayesian update in (4)) or the change in human behavior (modeled via the parameter ) over time. This continuous evolution of can be described by the following equation:
Here, the function represents the intrinsic changes in the human behavior, whereas the other component captures the Bayesian change in due to the observation . Note that the time derivative in (8) is pointwise in the space.
Typically, the Bayesian update is performed in discrete time when the new observations are received; however, in this work, we reason about continuous changes in and the corresponding continuous changes in the human state. We omit a detailed derivation, but intuitively, to relate continuous-time Bayesian update to discrete-time version, in (8) can be thought of as the observation frequency. Indeed, as , i.e., observations are received continuously, instantaneously changes to . On the other hand, as , i.e., no observation are received, the Bayesian update does not play a role in the dynamics of .
Given the joint state at time , , and the control policy in (7), we are interested in computing the following set:
Intuitively, represents all possible states of the joint system, i.e., all possible human states and beliefs over , that are reachable under the dynamics in (6) for some sequence of human actions. We refer to this set as Belief Augmented Forward Reachable Set (BA-FRS) from here on. Given a BA-FRS, we can obtain by projecting on the human state space. In particular,
where denotes the human state component of . Consequently, the probability of any human state can be obtained as (and otherwise) which can be used to obtain for any .
Since the control policy in (7) is stochastic, the computation of is a stochastic reachability problem which can be computationally demanding [abate2007computational]. To overcome this challenge, we instead compute an approximation of , which can be obtained as the solution of a deterministic reachability problem, significantly alleviating the computational complexity of at the expense of obtaining more conservative predictions. We next discuss HJ-reachability analysis for computing the BA-FRS thanks to modern computational tools [mitchell2004toolbox, chen2013flow], and discuss how we can cast as a deterministic reachability problem.
Iii-B Background: Hamilton-Jacobi Reachability
HJ-reachability analysis [lygeros2004reachability, mitchell2005time, bansal2017hamilton] can be used for computing a general Forward Reachable Set (FRS) given a set of starting states . Intuitively, is the set of all possible states that the system can reach at time starting from the states in under some permissible control sequence. The computation of the FRS can be formulated as a dynamic programming problem which ultimately requires solving for the value function in the following initial value Hamilton Jacobi-Bellman PDE (HJB-PDE):
where . Here, and denote the time and space derivatives of the value function respectively. The function is the implicit surface function representing the initial set of states . The Hamiltonian, , encodes the role of system dynamics and control, and is given by
Once the value function is computed, the FRS is given by .
Iii-C An HJ Reachability-based framework for prediction
In this section, we build on the reachability formalism for prediction in Sec. III-A to obtain a framework which we will use to both generate faster predictions, as well as to enable planners to answer important analysis questions about Bayesian predictors. Our framework is based on one key idea: rather than using a probability distribution over human actions as in (7), we will use a deterministic set of allowable human actions at every step. Very importantly, this set will be state-dependent, and therefore belief-dependent:
where is a function allowed to depend on both the control and the state , and threshold . Using a control set rather than a distribution allows us to convert the stochastic reachability problem in Sec. III-A to a deterministic reachability problem, which can be solved using the HJB-PDE at a significant lower computational complexity.
We now illustrate how different instantiations of in our framework enable both prediction and predictor analysis. Prediction. We generate a predictor using our framework by instantiating the set of allowable human actions to be only those with sufficient probability under the belief:
Now, instead of associating future states with probabilities, we maintain a set of feasible states at every time step. Over time, this set still evolves via (6), but now all actions that have too low probability are excluded, and actions that have high probability are all treated as equally likely. However, because of the coupling between future belief and allowable actions, we may approximate via a , using a non-zero . In Sec. IV, we demonstrate how relates to empirically and discuss the relative pros and cons.
Analysis. Suppose we have a prior (or current belief) over ; however, the prior might be wrong, i.e. , with being some hypothesized ground truth value for the human internal state. A reasonable question to ask in such a scenario would be “how long it would take the robot to realize that the value of the internal state is , i.e. to place enough probability in its posterior on ?” A different instantiation of our framework can be used to answer such questions: we now want to compute the BA-FRS under allowed human actions that are modeling the hypothesized ground truth, and compute how long it takes to attain the desired property on the belief (we discuss this further in Sec. V). Thus, the allowed control set is:
Overall, by choosing appropriately, we can generate a range of predictors and analyses. The two examples above seemed particularly useful to us, and we detail them in the following sections.
Iv A New HJ Reachability-based Predictor
Our reachability-based framework enables us generate a new predictor by computing an approximation of BA-FRS. In this section, we analyze the similarities and differences between this predictor and the one obtained by solving the full stochastic reachability problem.
Prediction as an approximation of .
Since the stochastic reachability problem needs to explicitly maintain the state probabilities, it is significantly more expensive to compute compared to .
However, this advantage in computation complexity is achieved at the expense of losing the information about the human state distribution, which can be an important component for several robotic applications.
However, when the full state distribution is not required, as is the case in this paper, provides a very good approximation of .
In fact, it can be shown that .
For simplicity, consider the case when the intrinsic behavior of the human does not change over time, i.e., .
Since in this case takes only two possible values, the joint state space is three dimensional.
In particular, , where .
is given by so we do not need to explicitly maintain it as a state.
which can be used to compute the set of allowable controls for different s as per (14).
We use the Level Set Toolbox [mitchell2004toolbox] to compute the BA-FRS, starting from .
The corresponding likely human states, , for different initial priors and s are shown in Fig. 3 in magenta.
For comparison purposes, we compute (teal), as well as the “naive” FRS obtained using all possible human actions (gray).
for is picked to capture 95% area of the set.
As evident from Fig. 3, is an over-approximation of , but at the same time it is not overly conservative unlike the naive FRS. This is primarily because even though the proposed framework doesn’t maintain the full state distribution, it still discards the unlikely controls during the BA-FRS computation. It is also interesting to note that BA-FRS is not too sensitive to the initial prior for low s. This property of BA-FRS allows the predictions to be robust to incorrect priors as we explain later in this section.
We also show the full 3D BA-FRS, as well as the projected sets over time for an initial prior, , in Fig. 2. When is high, both the belief as well as the human states are biased towards the goal over time. When is high, only the actions that steer the human towards will be initially contained in the control set for the BA-FRS computation. Moreover, propagating the current belief under these actions further reinforces the belief that the human is moving towards . As a result, the beliefs in the BA-FRS shift towards a higher over time. On the other hand, when , the human actions under are also contained in the control set, which leads to the belief and the human state shift in both directions (towards and ).
Prediction as robust to incorrect priors and misspecified models. The set depends heavily on the prior . When the initial prior for the human motion prediction is not accurate enough, using for planning might lead to unsafe behavior as it can be too optimistic. This issue is particularly exacerbated when the true (unknown) parameter of the human, , is not within the support of considered in the model, i.e., when the model is misspecified. For example, when the exact goal of the human may not be any of the goals specified in the model. In such scenarios, a full Bayesian inference may fail to assign sufficient probabilities to the likely states of the human, which can lead to unsafe situations. On the other hand, using the full FRS, i.e., the set of all states the human can reach under any possible control, will ensure safety but can impede robot plan efficiency.
In such situations, the proposed framework presents a good middle-ground alternative to the two approaches – it does not rely heavily on the exact probability of an action while computing FRS since it leverages action probabilities only to distinguish between likely and unlikely actions. Yet, it still uses a threshold to discard highly unlikely actions under the current belief, ensuring the obtained FRS is not too conservative. This allows our framework to perform well in situations where the initial prior is not fully accurate but accurate enough to distinguish likely actions from unlikely actions. In particular, suppose the prior at time is such that , where is the true (unknown) human parameter, and . Intuitively, the above condition states that the prior at time is accurate enough to distinguish the set of likely actions from the unlikely actions for the true human behavior; however, we do not have the knowledge of the true probability distribution of the actions. In such a case, it can be shown that any human state that is reachable under a control sequence consisting of at least -likely controls will be contained within .
Consider the scenario where the actual human goal is , midway between and (see Fig. 4).
Thus, the current model does not capture the true goal of the human.
Even though the human walks straight towards , a full Bayesian framework fails to assign sufficient probabilities to the likely human states because of its over reliance on the model.
Ultimately, this leads to a collision between the human and the robot.
In contrast, since our framework uses the model to only distinguish likely actions from unlikely actions, it recognizes that moving straight ahead is a likely human action.
This is also evident from Fig. 3, where the states straight ahead of the human are contained within the BA-FRS even for a relatively high of 0.2.
As a result, using the deterministic BA-FRS for planning leads to a safe navigation around the human.
These results are confirmed in our hardware experiments performed on a TurtleBot 2 testbed. As shown in Fig. 1, we demonstrate these scenarios for navigating around a human participant. We measured human positions at 200Hz using a VICON motion capture system and used on-board TurtleBot odometry sensors for the robot state measurement. As discussed, our framework allows us to be robust to misspecified goals while not being overly conservative.
V Prediction Analysis
An interesting question that our framework can answer is how long does the predictor have to observe the human in order to determine the true human behavior for some prior. For simplicity, consider the scenario where can take two possible values or ; however, the true human parameter is unknown. We also have an initial prior over , given by . Then we may pose the following questions: (1) What is the minimum and maximum possible time it will take to determine that with sufficiently high probability (denoted as and )? (2) What are the corresponding sequences of observations? Thus, we want to know what is the most and the least informative set of observations that the human can provide under . Similar questions can also be posed for . The overall minimum and the maximum time to determine the true human behavior are then given by and . Once and are available, the robot trajectory can be planned to safeguard against both possibilities ( and ) for . After a duration of , we will be able to determine the true human behavior so it is sufficient to safeguard against the likely states of the human under the belief there on.
As discussed in Sec. III-C, an instantiation of the proposed reachability-based framework can be used to determine and . In particular, given the initial human state and the initial prior , we can compute the BA-FRS with the control policy in (15), and the Hamiltonian Then, can be obtained as the minimum time such that is contained in for some human state . Here, is a distribution that assigns a sufficiently high probability to . Intuitively, this computation gives the minimum time it will take to reach the belief that if the true human parameter is indeed and the human is giving us the most informative samples to discern its behavior. We can similarly compute by computing a similar BA-FRS under the likely controls from .
Similarly, can be computed using a similar procedure, but instead of maximizing over the control in the Hamiltonian, we minimize over the control instead. This computation corresponds to finding the control sequence that is least informative at inferring and can be obtained by: Intuitively, is the control observation that least differentiate between and . Similarly, corresponding to the computation of is the control observation that differentiates most between and . This is closely related to prior work on legibility [dragan2013legibility] and deception [dragan2014analysis]: given a fixed horizon, our framework computes a sequence of controls that is maximally informative or maximally ambiguous cumulatively across all the time steps, which in general is nontrivial to compute.
Running example: Consider the planar pedestrian dynamics as before, but with the following human policy:
where . The human walks straight with a small variance when and move in a random direction when , approximating an irrational human. We compute the minimum and maximum time to realize starting from a high initial prior on irrational behavior, . We assume that we can confidently conclude that when all human trajectories reach a belief of at least for . The obtained and are 3.2s and 11.2s respectively. We also compute the control sequences that correspond to these times. The optimal control sequence for is given by , since that is the most likely action under the rational behavior compared to irrational behavior. On the other hand, for , the optimal control sequences consist of an angle of 15 degrees, which is the least likely action that is above the -threshold (0.3 for this example) for .
Vi Conclusion and Future Work
When robots operate in complex environments around humans, they often employ probabilistic predictive models to reason about human behavior. While a full probabilistic prediction maybe tractable in some cases, it can be extremely difficult for agents whose intent and preferences are evolving over time. In this work, we formulate the human prediction problem as a stochastic reachability problem, and then present a Hamilton-Jacobi reachability-based framework that not only can compute an approximation of likely human states at a significant lower computational complexity, but can also perform predictor analysis. We demonstrate that the proposed framework also provides more robust predictions when the human behavior model is inaccurate, and demonstrate our approach in simulation and hardware. Extending our methodology to multi-robot, multi-human settings as well as to other online learning-based settings are exciting future directions.