Autonomous robots, such as service robots and self-driving vehicles, are often required to coexist in environments with other dynamic objects. For robots to safely navigate and plan in an anticipatory manner in dynamic environments, accurate motion predictions for other nearby objects are needed. Many recent developments [Alahi2016SocialLH, Gupta2018SocialGS, wzhiKTM]
in motion prediction have relied on learning based approaches, whereby future positions, or probabilities over future positions, are learned from motion trajectories of previously observed dynamic objects. Learning-based methods to motion prediction typically extract patterns from the training data. These patterns are then used to extrapolate future positions based on new observations without explicitly considering the feasibility of the prediction.
However, there are often rules for motion trajectories that are known a priori, but are not explicitly enforced during extrapolation by learning-based models. In particular, we address the problem of incorporating constraints that arise from the environment’s structure, such as obstacles, into motion predictions. In most environments, motion trajectories are not highly unstructured, and depend closely on the environment occupancy structure.
To address the shortcomings of purely learning-based methods, we view motion prediction as a combined probabilistic trajectory learning and constrained trajectory optimisation problem. Constrained optimisation methods allow for constraints to be imposed directly onto the distribution of trajectories without incorporating the collected data. We propose a novel framework that incorporates structural constraints into probabilistic motion prediction problems. Our framework leverages the ability of learning based models to extract patterns from data, as well as the ability of constrained trajectory optimisation approaches to explicitly specify constraints, giving better generalisability and higher quality extrapolations. Specifically, our contributions consist of: (1) a method to learn a distribution over trajectories from observations that is amenable to being optimised; (2) an approach to enforce constraints on distributions of trajectories, particularly collision chance constraints when provided an occupancy map.
Ii Related Work
Motion Trajectory Prediction. Motion prediction is a problem central to robot autonomy. The most simple methods of motion prediction are physics-based models, such as constant velocity/acceleration models [Schubert2008ComparisonAE]. More complex physics-based models often fuse multiple dynamic models, or select a best fit model from a set [Kooij2018ContextBasedPP]. Recent developments in machine learning have led to learning-based methods that learn motion patterns [Alahi2016SocialLH, Zhi2019SpatiotemporalLO]Alahi2016SocialLH, Zhang2019SRLSTMSR, Amirian2019SocialWL, Gupta2018SocialGS] have gained popularity. Methods of learning entire trajectories, similar to our learning component, but with relaxed dependencies, is outlined in [wzhiKTM, zhiOct]. Further attempts to incorporate static obstacles in learning attempt to do so as a part of the learning pipeline [Sadeghian_2019_CVPR] and attempts to extrapolate behaviour. Approaches of this kind cannot guarantee that valid solutions satisfy specified chance constraints, and there have not been approaches that directly incorporate occupancy maps, which are typically built by robots from depth sensor data.
Trajectory Optimisation. Trajectory optimisation has been successfully applied in motion planning, by directly expressing obstacles as constraints to solve for collision-free paths between start and goal points. Popular methods include Trajopt [Schulman2013FindingLO], CHOMP [Zucker2013CHOMPCH], STOMP [Kalakrishnan2011STOMPST], and GPMP2 [Dong2016MotionPA]. Trajopt utilises sequential quadratic programming to find locally optimal trajectories compliant with constraints. CHOMP exploits obstacle gradients for efficient optimisation. Further work on gradient-based trajectory optimisation for planning can be found in [Marinho2016FunctionalGM, Francis2019FastSF]. Although trajectory optimisation is a popular choice to obtain collision-free trajectories for motion-planning, there has yet to be extensive work in utilising trajectory optimisation for motion prediction. The optimisation required in this work is also different in that we optimise to obtain distributions of trajectories, rather than a single trajectory as required in motion planning.
Combined Learning and Optimisation. There has been previous explorations of combining learning and trajectory optimisation. [Toussaint1]
introduce a framework which combines a known analytical cost function with black box functions in reinforcement learning for manipulation.[Rana2017TowardsRS] introduces CLAMP, a framework that combines learning from demonstration with optimisation based motion planning. Theoretical aspects of using learning in conjunction with optimisation, in a Bayesian framework known as posterior regularisation has also been investigated in [PReg]. All of these methods aim to add more structure to be encoded in learning problems. Similarly our proposed framework allows for encoding of structure, but specifically for the trajectory prediction problem, where we are required to do learning and optimisation on distributions of trajectories with respect to environment occupancy.
Iii Continuous-time Motion Trajectories
Our framework operates over continuous motion trajectories, capable of being queried at arbitrary resolution, without forward simulating. In this section, we briefly introduce the continuous-time motion trajectory representation used in this work. We restrict ourselves to the 2-D case.
Similar to methods described in [Francis2019FastSF, Marinho2016FunctionalGM, wzhiKTM], we represent motion trajectories as functions .
represents the prediction time horizon. To we construct our trajectories as dot products between weight vectors andreproducing kernels
. We limit our discussion to squared exponential radial basis function (RBF) kernels, though methods in this study generalise to various kernels with minimal modifications. Using RBFs result in smooth trajectories, and have shown good empirical convergence properties for functional trajectory optimisation[Marinho2016FunctionalGM]. A trajectory, , can be expressed as:
where the basis functions are given as, . The coordinates of a dynamic object at time, , is given as and , , are weight parameters, , and are pre-define time points at which RBF kernels are centered. is a length-scale hyper-parameter which determines the impact a time point neighbouring times on the trajectory.
Motion data typically comes in the form of discrete sequences of timestamped coordinates . To construct a best-fit continuous trajectory of discrete sequences from
timestamped coordinates, we solve the ridge regression problem:
where is a regularisation hyper-parameter, with the set of uniform times selected a priori, as centers where the kernels are fixed. Evaluating the closed form ridge regression solution results in weight vectors and which forms the trajectory . We denote the stacked vector of and as . As future motion trajectories are inherently uncertain, the motion prediction problem often requires predicting a distribution over future trajectories a dynamic object may follow. We denote distributions of trajectories as , with the vector of probabilistic weight parameters, , and the parameters of the distribution over is denoted as . gives the distribution of coordinate position at time . Every sample function of is a trajectory
, and samples of vector of random variablesgive . Denoting the parameter distributions associated with and as and , distributions of trajectories can be constructed as . In practice, to obtain we can also sample from the distribution , and take dot products with .
Iv Motion Prediction with Environmental Constraints
Iv-a Problem Overview
Our goal is to reason about the possible future trajectories of dynamic objects, based on the immediate past trajectories of the objects, along with occupancy information from the environment. In this work, we limit our discussion to trajectories in 2-D, which usually arise from the tracking of humans and vehicles in video data. We are assumed to have a dataset containing coordinates of an object up to a time, along with its coordinates thereafter, and an occupancy map of the environment. Examples from a real-world dataset are shown in fig. 3, as well as a prediction given by a learning-based model. Given the observed historical coordinates, we aim to extrapolate the distribution of trajectories representing likely future motion thereafter. We require the distribution of trajectories compliant with the environment structure indicated in the occupancy map, such that the probability of hypothesising infeasible trajectories to be constrained to a small value.
Iv-B Framework Overview
An overview of our framework is shown in fig. 2. We learn from our training data to predict a vector of random variables of weights with distribution parameters , and produce a prior distribution of trajectories, , as described in eq. 1. Our aim is to find an alternative random vector , defined by optimised parameters , which satisfies enforced constraints, while remaining close to the learned distribution . The optimisation objective uses , while constraints are provided through the obstacle cost operator, , which requires an occupancy map of the environment. The desired random vector defines a constraint-compliant distribution of trajectories, , provided pre-defined RBF features , by eq. 1. Individual constraint-compliant future trajectories, , can then be realised from .
To find the constrain-compliant parameters , we formulate the optimisation problem in a similar manner to Bayesian posterior regularisation methods [PReg]. The desired parameters is the solution to the constrained optimisation problem:
where is the Kullback-Leibler (KL) divergence [BishopBook], and gives us a definition of “close-ness” between our predicted and optimised distributions. is the obstacle cost operator, which maps distributions of trajectories to a scalar cost value. The cost is interpreted as the time-averaged probability of generating a trajectory that collides with obstacles. We designate to be the allowed limit of the obstacle cost. Intuitively, our optimisation problem returns the closest distribution of trajectories, as defined by the KL divergence term, to that given by our predictive model, subject to the obstacle collision constraint. By separating the learning and optimisation, we disentangle the complexities of the learning model and the structural constraints. Our learned also provides us with a “warm-start” initial solution to our optimisation. In contrast to learning-based approaches, the constraints in our optimisation problem is enforced in valid solutions. The main challenges to solving this trajectory optimisation problem lies in how to obtain the vector of random variables , which defines our prior trajectory distribution , as well as how to derive the obstacle cost operator required for optimisation. These are addressed in the following subsections.
Iv-C Learning Distributions of Trajectories
In this subsection, we outline a method of learning a mapping from observed historical trajectories to prior distributions of future trajectories, parameterised by . The resulting prior distributions of trajectories are to be used directly in the optimisation problem eq. 4. We model the distribution of trajectories,
, as mixtures of multi-output stochastic processes. To capture the correlation between x and y coordinates, the distribution over weight parameters are assumed to be mixtures of matrix normal distributions[Dawid1981SomeMD]. A matrix normal distribution is the generalisation of the normal distribution to matrix-valued random variables, where each sample from a matrix normal distribution gives a matrix. We work with the matrix of random variables defined by vertically stacking the parameters of coordinate dimensions, denoted as :
We assume that there are
mixture components, similar to a Gaussian mixture model[BishopBook], with each component being a matrix normal distribution, and associated weight components . Matrix normal distributions have location matrices , and positive-definite scale matrices , as parameters [MLEMatN].
We utilise a neural network to learn a mapping from a fixed-length vectorised coordinate sequences,, that encodes observed partial trajectories, to component weights , along with location matrices the lower triangular and diagonal matrices. is positive-definite and will be constructed by taking the product of two lower triangular matrices. The dependence between different times on the trajectory are largely captured the RBF features in the time domain , as described in section III. Hence, it is sufficient for to be a positive definite diagonal matrix. We obtain the input vector
by taking a 10 time-step window of the latest observed coordinates, and vectorising to obtain a vector of 20 dimensions. The neural network used is relatively light-weight, consisting of 4 Relu-activated dense layers with number of neurons:. In this work, our main goal is to outline a representation of future trajectories outputted by the network, such that the result can be directly optimised. Our method does not preclude alternative encoding schemes of conditioned history trajectories to be inputted into the network, such as features from LSTM networks [LSTM] or trajectory features outlined in [wzhiKTM].
Provided a dataset of history and future trajectories, given by pairs of timestamped coordinate sequences. After flattening the history trajectory sequence to obtain , and applying eq. 2 on trajectory futures to obtain weight matrices, we have pairs denoted as . We learn a mapping from to parameters
with the fully-connected neural network, with the negative log-likelihood of the weight matrices distributions as the loss function:
and by applying the activation functions:
where are neural network outputs, and , indicate the matrix diagonal, and strictly lower triangular matrix elements respectively. The activation function guarantees the validity of the scale matrices, by enforcing . We also enforce using the function. After we train the neural network, provided a vectorised historical trajectory, , the neural network gives the corresponding random vector of weights of our prior trajectory distribution, , where is the vectorise operator. The prior distributions of trajectories is then the dot product of the trajectory parameters and the RBF features, , as described in eq. 1.
Iv-D Constraints on Distributions of Trajectories
After obtaining the weights, , of a prior distribution of trajectories via learning, our main challenge is to design a cost operator, , to encode our desired constraints. Specifically we wish to design to account for collisions of distributions of trajectories, provided an occupancy map. maps a distribution of smooth trajectories, , to a real-valued scalar cost, which quantifies how collision-prone a given is. differs from the cost operator in trajectory optimisation motion planning problems, such as those in [Zucker2013CHOMPCH, Marinho2016FunctionalGM, Francis2019FastSF]
, in that the input is a distribution over trajectories rather than a single trajectory. Our optimisation can not only optimise over trajectory mean, but also over trajectory variances.
Like trajectory optimisation methods [Zucker2013CHOMPCH], we assume the cost operator takes the form of average costs over prediction horizon , where the cost at a given point in time, , is the probability of a collision at a given time. contains weight variables with a distribution parameterised by . We can then interpret as the average probability of collision over a time horizon, defined as:
The probability of collision at some time and location is computed by taking the product of the probability of the trajectory reaching the coordinate at the given time and the probability that the coordinate is occupied. We are provided with an occupancy map and the probability a coordinate of interest, , being occupied is denoted as . To evaluate the cost, we could marginalise over the all trajectory parameters, but this may be difficult due to the high dimensionality of , instead work in the 2-D world space, where the environmental constraints are defined. As is constructed by taking the dot product of weight vectors and feature vectors, trajectory coordinate distribution in world space at time is:
By the equivalences between matrix normal and multivariate normal distributions [Dawid1981SomeMD], we have a mixture of two-dimensional multivariate normal distributions in world-space,
. This is the probability distribution over trajectory coordinates at time, . The mixture means and covariances of each component are defined as and Substituting eq. 14 into the inner integral of eq. 12, gives the cost operator equation at given time:
Note that and are dependent on . During the optimisation, we assume that each mode can be optimised independently and that the importance of each mode, represented by , is accurately captured by the learning model and does not change during trajectory optimisation. We then define the set of parameters to optimise as . The integral defined in eq. 15 may be intractable to evaluate analytically, so we aim to find an approximation to it. The coordinate distribution at a given time in world space,
, is a mixture of bi-variate Gaussian distributions. We use Gauss-Hermite quadrature, suitable for numerically integrating exponential-class functions, to approximateeq. 15. Quadrature methods approximate an integral by taking the weighted sum of integrand values sampled at computed points, called abscissae. We introduce the change of variables, , where the Cholesky factorisation gives . Using Gauss-Hermite quadrature schemes, we have:
where we find the values of as abscissa obtained from roots of Hermite polynomials, and , are the associated quadrature weights. Details of abscissae and weight calculations, as well as a review on multi-dimensional Gauss-Hermite quadrature, can be found in [GaussHermite]
. An example of abscissae points when estimating a multi-variate Gaussian is shown infig. 4. Next, we evaluate the outer integral in eq. 12 using rectangular numerical integration for . With defined, we are in a position to solve the combined learning and constrained optimisation problem, defined earlier in eq. 4, using a non-linear optimisation solver, such as the sequential least squares optimiser SLSQP [slsqp]. Analytical gradients with respect to occupancy is provided when using a continuous differential occupancy map such as those introduced in [HilbertMap, icra_wzhi2019]. By utilising the equivalence between matrix normal distributions and multivariate Gaussians [MLEMatN], the KL divergence term between and components can be computed [BishopBook]. After obtaining the solution of the optimisation we have the optimised distribution of . By taking dot products between and , we can construct the constraint-compliant trajectory distribution of futures. Individual realised trajectories can then be generated from .
Although we have focused on collision constraints with respect to environment structure, the optimisation allows for constraints on velocity or acceleration. The continuous nature of our trajectories allows time derivatives of trajectories to be obtained. Specifically, the derivative reproducing property, ensures the derivatives of smooth kernels also correspond to kernels [DerivativeRKHS]. We can reason about the -order derivatives of displacement by replacing with .
V Experimental Evaluation
We empirically evaluate (1) the ability of our learning component to learn probabilistic representations of future trajectories to provide good prior trajectories distributions; (2) the ability of the proposed framework to enforce collision constraints, and its effect on trajectory quality.
V-a Datasets and Metrics
We run our experiments on one simulated dataset, and two real-world datasets, including:
Simulated dataset (Simulated): This contains simulated trajectories on a floor-plan;
THÖR dataset [thorDataset2019] (Indoor): This dataset contains real human trajectories walking in a room with three large obstacles in the center of the room.
Lankershim dataset (Traffic): This dataset contains real vehicle data along a long boulevard. We use a particular busy intersection between and .
Throughout each of our experiments, we ensure that the observed trajectory conditioned on contains 10 timesteps. The prediction horizons, , for the simulated, indoor and traffic datasets are 15, 10, and 20 timesteps respectively. We split long trajectories, such that we have 200 pairs of partial trajectories in the simulated dataset, while the indoor dataset contained 871 pairs of partial trajectories, and the traffic dataset contained 6175 pairs. Trajectories that were shorter than the length of the prediction horizon were filtered out. A train-to-test ratio of 80:20 was used for each dataset. For our learning model, we set the number of mixture components , hyper-parameters , and for the simulated, indoors, and traffic datasets respectively. Maps for the simulated and indoor datasets are available, and we construct a map for the traffic dataset by considering the road positions. Our occupancy map is represented as a continuous Hilbert Map [HilbertMap], giving smooth occupancy probabilities over coordinates, and occupancy gradients.
Metrics used in the quantitative evaluation include: (i) Average displacement error (ADE): The average euclidean distance error between the expectation of the nearest distribution component of trajectories with ground truth trajectory; (iii) Final displacement error (FDE): The euclidean distance error at the end of time horizon between the expectation of the nearest distribution component of trajectories with ground truth trajectory. (iii) Average Likelihood (AL): To take into account the uncertain nature of motion predict, for probabilistic models, we record the likelihood of drawing the trajectory averaged over the timesteps. Note that likelihoods are difficult to interpret and not comparable over different datasets. Lower ADE and FDE, and higher AL indicate better performance. For the evaluation of collisions, we also record the percentage of constraint-violating trajectory distributions in test set.
V-B Learning Continuous-Time Trajectory Distributions as Priors
The trajectory distribution learning component of our framework provides a prior distribution for the optimisation. We examine whether the learning component of our framework is able to learn distributions of future trajectories from data, comparing the performance with several benchmark models. The benchmark models are: (1) Kernel trajectory maps (KTMs) [wzhiKTM], (2) Constant velocity (CV) model, and (3) Naive neural-network (NN) model, where we learn a mapping between past trajectories and future trajectories by minimising the mean squared error loss. The neural network comprised of Relu-activated fully-connect layers with number of neurons: , where gives the trajectory time horizon.
We see that the learning component of our framework is able to learn high quality distributions of trajectories. Figure 5 shows examples of extrapolated trajectory distributions in red, and the observed trajectory in blue. We see that the learned distribution is relatively close to the ground truth green trajectory, and is able to capture the uncertain nature of motion prediction. Although a non-negligible number of sampled trajectories collide with the environment. Table I summarises the performance results of our method against our benchmarks. We see that our learning model is comparable to the method proposed in [wzhiKTM], while outperforming a constant velocity and naive neural network model. While the KTM method performs slightly better on the traffic dataset, it is purely learning-based and lacks mechanisms to enforce constraints. We note that our method encodes the observed trajectory by simply flattening the input coordinates, as described in section IV-C. More sophisticated methods of encoding the observed trajectory (such as with LSTM networks [LSTM]), independent of our outlined output of trajectory distributions, could improve the performance of the learning model.
V-C Enforcing Occupancy Compliance via Trajectory Optimisation
After obtaining a prior from our learned model, we apply constraints on the prediction via trajectory optimisation. We will evaluate whether collision constraints can be enforced, and their effect on trajectory quality. We define the constraint such that the time-averaged probability of collision to be less than or equal to 0.05, . The SLSQP optimiser was able to solve the optimisation quickly, under half a second, with a python implementation. We obtain trajectory distribution priors from our learned model, and select all of the observations where the learned prior trajectory is constraint-violating. We evaluate the performance of the trajectory priors, and compare them with the trajectory distribution predictions after enforcing constraints. The quantitative results are listed in table II. We see after solving with structural constraints, all predictions are feasible. Additionally, the trajectory distribution quality improves after enforcing constraints for all the datasets considered.
Qualitative results are shown in fig. 6. We see the abscissae points of the largest trajectory distribution component before (top) and after (bottom) the trajectory optimisation of examples in each dataset. Our framework is able to ensure that the distributions comply with environmental structure. Notably, we see that the trajectory optimisation not only adjusts the mean of the trajectory distribution, but also optimises the variances, giving more robust trajectory distributions. This effect is most prominent in the left sub-figure, where the predicted distribution of trajectories in a simulated hallway recovers a much tighter variance which follows the environment structure. We also observe instances collision-avoidance with objects in open space (demonstrated in middle sub-figure), as well as increased adherence to road structure (right sub-figure).
We introduced a novel framework to learn motion predictions from observations while conforming to constraints, such as obstacles. Our framework consists of a learning component which learns a distribution over future trajectories. This distribution is used as a prior in a trajectory optimisation step which enforces chance constraints on the trajectory distribution. We empirically demonstrate that our framework can learn complex trajectory distributions and enforce compliance with environment structures via optimisation. This results in reduced variance and the avoidance of obstacles by the predicted trajectories, leading to improved prediction quality.