I Introduction
Autonomous robots, such as service robots and selfdriving vehicles, are often required to coexist in environments with other dynamic objects. For robots to safely navigate and plan in an anticipatory manner in dynamic environments, accurate motion predictions for other nearby objects are needed. Many recent developments [Alahi2016SocialLH, Gupta2018SocialGS, wzhiKTM]
in motion prediction have relied on learning based approaches, whereby future positions, or probabilities over future positions, are learned from motion trajectories of previously observed dynamic objects. Learningbased methods to motion prediction typically extract patterns from the training data. These patterns are then used to extrapolate future positions based on new observations without explicitly considering the feasibility of the prediction.
However, there are often rules for motion trajectories that are known a priori, but are not explicitly enforced during extrapolation by learningbased models. In particular, we address the problem of incorporating constraints that arise from the environment’s structure, such as obstacles, into motion predictions. In most environments, motion trajectories are not highly unstructured, and depend closely on the environment occupancy structure.
To address the shortcomings of purely learningbased methods, we view motion prediction as a combined probabilistic trajectory learning and constrained trajectory optimisation problem. Constrained optimisation methods allow for constraints to be imposed directly onto the distribution of trajectories without incorporating the collected data. We propose a novel framework that incorporates structural constraints into probabilistic motion prediction problems. Our framework leverages the ability of learning based models to extract patterns from data, as well as the ability of constrained trajectory optimisation approaches to explicitly specify constraints, giving better generalisability and higher quality extrapolations. Specifically, our contributions consist of: (1) a method to learn a distribution over trajectories from observations that is amenable to being optimised; (2) an approach to enforce constraints on distributions of trajectories, particularly collision chance constraints when provided an occupancy map.
Ii Related Work
Motion Trajectory Prediction. Motion prediction is a problem central to robot autonomy. The most simple methods of motion prediction are physicsbased models, such as constant velocity/acceleration models [Schubert2008ComparisonAE]. More complex physicsbased models often fuse multiple dynamic models, or select a best fit model from a set [Kooij2018ContextBasedPP]. Recent developments in machine learning have led to learningbased methods that learn motion patterns [Alahi2016SocialLH, Zhi2019SpatiotemporalLO]
from a dataset of observed trajectories. In particular, longshort term memory (LSTM) and generative adversarial network (GAN)based models
[Alahi2016SocialLH, Zhang2019SRLSTMSR, Amirian2019SocialWL, Gupta2018SocialGS] have gained popularity. Methods of learning entire trajectories, similar to our learning component, but with relaxed dependencies, is outlined in [wzhiKTM, zhiOct]. Further attempts to incorporate static obstacles in learning attempt to do so as a part of the learning pipeline [Sadeghian_2019_CVPR] and attempts to extrapolate behaviour. Approaches of this kind cannot guarantee that valid solutions satisfy specified chance constraints, and there have not been approaches that directly incorporate occupancy maps, which are typically built by robots from depth sensor data.Trajectory Optimisation. Trajectory optimisation has been successfully applied in motion planning, by directly expressing obstacles as constraints to solve for collisionfree paths between start and goal points. Popular methods include Trajopt [Schulman2013FindingLO], CHOMP [Zucker2013CHOMPCH], STOMP [Kalakrishnan2011STOMPST], and GPMP2 [Dong2016MotionPA]. Trajopt utilises sequential quadratic programming to find locally optimal trajectories compliant with constraints. CHOMP exploits obstacle gradients for efficient optimisation. Further work on gradientbased trajectory optimisation for planning can be found in [Marinho2016FunctionalGM, Francis2019FastSF]. Although trajectory optimisation is a popular choice to obtain collisionfree trajectories for motionplanning, there has yet to be extensive work in utilising trajectory optimisation for motion prediction. The optimisation required in this work is also different in that we optimise to obtain distributions of trajectories, rather than a single trajectory as required in motion planning.
Combined Learning and Optimisation. There has been previous explorations of combining learning and trajectory optimisation. [Toussaint1]
introduce a framework which combines a known analytical cost function with black box functions in reinforcement learning for manipulation.
[Rana2017TowardsRS] introduces CLAMP, a framework that combines learning from demonstration with optimisation based motion planning. Theoretical aspects of using learning in conjunction with optimisation, in a Bayesian framework known as posterior regularisation has also been investigated in [PReg]. All of these methods aim to add more structure to be encoded in learning problems. Similarly our proposed framework allows for encoding of structure, but specifically for the trajectory prediction problem, where we are required to do learning and optimisation on distributions of trajectories with respect to environment occupancy.Iii Continuoustime Motion Trajectories
Our framework operates over continuous motion trajectories, capable of being queried at arbitrary resolution, without forward simulating. In this section, we briefly introduce the continuoustime motion trajectory representation used in this work. We restrict ourselves to the 2D case.
Similar to methods described in [Francis2019FastSF, Marinho2016FunctionalGM, wzhiKTM], we represent motion trajectories as functions .
represents the prediction time horizon. To we construct our trajectories as dot products between weight vectors and
reproducing kernels. We limit our discussion to squared exponential radial basis function (RBF) kernels, though methods in this study generalise to various kernels with minimal modifications. Using RBFs result in smooth trajectories, and have shown good empirical convergence properties for functional trajectory optimisation
[Marinho2016FunctionalGM]. A trajectory, , can be expressed as:(1) 
where the basis functions are given as, . The coordinates of a dynamic object at time, , is given as and , , are weight parameters, , and are predefine time points at which RBF kernels are centered. is a lengthscale hyperparameter which determines the impact a time point neighbouring times on the trajectory.
Motion data typically comes in the form of discrete sequences of timestamped coordinates . To construct a bestfit continuous trajectory of discrete sequences from
timestamped coordinates, we solve the ridge regression problem:
(2) 
where is a regularisation hyperparameter, with the set of uniform times selected a priori, as centers where the kernels are fixed. Evaluating the closed form ridge regression solution results in weight vectors and which forms the trajectory . We denote the stacked vector of and as . As future motion trajectories are inherently uncertain, the motion prediction problem often requires predicting a distribution over future trajectories a dynamic object may follow. We denote distributions of trajectories as , with the vector of probabilistic weight parameters, , and the parameters of the distribution over is denoted as . gives the distribution of coordinate position at time . Every sample function of is a trajectory
, and samples of vector of random variables
give . Denoting the parameter distributions associated with and as and , distributions of trajectories can be constructed as . In practice, to obtain we can also sample from the distribution , and take dot products with .Iv Motion Prediction with Environmental Constraints
Iva Problem Overview
Our goal is to reason about the possible future trajectories of dynamic objects, based on the immediate past trajectories of the objects, along with occupancy information from the environment. In this work, we limit our discussion to trajectories in 2D, which usually arise from the tracking of humans and vehicles in video data. We are assumed to have a dataset containing coordinates of an object up to a time, along with its coordinates thereafter, and an occupancy map of the environment. Examples from a realworld dataset are shown in fig. 3, as well as a prediction given by a learningbased model. Given the observed historical coordinates, we aim to extrapolate the distribution of trajectories representing likely future motion thereafter. We require the distribution of trajectories compliant with the environment structure indicated in the occupancy map, such that the probability of hypothesising infeasible trajectories to be constrained to a small value.
IvB Framework Overview
An overview of our framework is shown in fig. 2. We learn from our training data to predict a vector of random variables of weights with distribution parameters , and produce a prior distribution of trajectories, , as described in eq. 1. Our aim is to find an alternative random vector , defined by optimised parameters , which satisfies enforced constraints, while remaining close to the learned distribution . The optimisation objective uses , while constraints are provided through the obstacle cost operator, , which requires an occupancy map of the environment. The desired random vector defines a constraintcompliant distribution of trajectories, , provided predefined RBF features , by eq. 1. Individual constraintcompliant future trajectories, , can then be realised from .
To find the constraincompliant parameters , we formulate the optimisation problem in a similar manner to Bayesian posterior regularisation methods [PReg]. The desired parameters is the solution to the constrained optimisation problem:
(3)  
(4) 
where is the KullbackLeibler (KL) divergence [BishopBook], and gives us a definition of “closeness” between our predicted and optimised distributions. is the obstacle cost operator, which maps distributions of trajectories to a scalar cost value. The cost is interpreted as the timeaveraged probability of generating a trajectory that collides with obstacles. We designate to be the allowed limit of the obstacle cost. Intuitively, our optimisation problem returns the closest distribution of trajectories, as defined by the KL divergence term, to that given by our predictive model, subject to the obstacle collision constraint. By separating the learning and optimisation, we disentangle the complexities of the learning model and the structural constraints. Our learned also provides us with a “warmstart” initial solution to our optimisation. In contrast to learningbased approaches, the constraints in our optimisation problem is enforced in valid solutions. The main challenges to solving this trajectory optimisation problem lies in how to obtain the vector of random variables , which defines our prior trajectory distribution , as well as how to derive the obstacle cost operator required for optimisation. These are addressed in the following subsections.
IvC Learning Distributions of Trajectories
In this subsection, we outline a method of learning a mapping from observed historical trajectories to prior distributions of future trajectories, parameterised by . The resulting prior distributions of trajectories are to be used directly in the optimisation problem eq. 4. We model the distribution of trajectories,
, as mixtures of multioutput stochastic processes. To capture the correlation between x and y coordinates, the distribution over weight parameters are assumed to be mixtures of matrix normal distributions
[Dawid1981SomeMD]. A matrix normal distribution is the generalisation of the normal distribution to matrixvalued random variables, where each sample from a matrix normal distribution gives a matrix. We work with the matrix of random variables defined by vertically stacking the parameters of coordinate dimensions, denoted as :(5) 
We assume that there are
mixture components, similar to a Gaussian mixture model
[BishopBook], with each component being a matrix normal distribution, and associated weight components . Matrix normal distributions have location matrices , and positivedefinite scale matrices , as parameters [MLEMatN].We utilise a neural network to learn a mapping from a fixedlength vectorised coordinate sequences,
, that encodes observed partial trajectories, to component weights , along with location matrices the lower triangular and diagonal matrices. is positivedefinite and will be constructed by taking the product of two lower triangular matrices. The dependence between different times on the trajectory are largely captured the RBF features in the time domain , as described in section III. Hence, it is sufficient for to be a positive definite diagonal matrix. We obtain the input vectorby taking a 10 timestep window of the latest observed coordinates, and vectorising to obtain a vector of 20 dimensions. The neural network used is relatively lightweight, consisting of 4 Reluactivated dense layers with number of neurons:
. In this work, our main goal is to outline a representation of future trajectories outputted by the network, such that the result can be directly optimised. Our method does not preclude alternative encoding schemes of conditioned history trajectories to be inputted into the network, such as features from LSTM networks [LSTM] or trajectory features outlined in [wzhiKTM].Provided a dataset of history and future trajectories, given by pairs of timestamped coordinate sequences. After flattening the history trajectory sequence to obtain , and applying eq. 2 on trajectory futures to obtain weight matrices, we have pairs denoted as . We learn a mapping from to parameters
with the fullyconnected neural network, with the negative loglikelihood of the weight matrices distributions as the loss function:
(6) 
and by applying the activation functions:
(7)  
(8)  
(9)  
(10) 
where are neural network outputs, and , indicate the matrix diagonal, and strictly lower triangular matrix elements respectively. The activation function guarantees the validity of the scale matrices, by enforcing . We also enforce using the function. After we train the neural network, provided a vectorised historical trajectory, , the neural network gives the corresponding random vector of weights of our prior trajectory distribution, , where is the vectorise operator. The prior distributions of trajectories is then the dot product of the trajectory parameters and the RBF features, , as described in eq. 1.
IvD Constraints on Distributions of Trajectories
After obtaining the weights, , of a prior distribution of trajectories via learning, our main challenge is to design a cost operator, , to encode our desired constraints. Specifically we wish to design to account for collisions of distributions of trajectories, provided an occupancy map. maps a distribution of smooth trajectories, , to a realvalued scalar cost, which quantifies how collisionprone a given is. differs from the cost operator in trajectory optimisation motion planning problems, such as those in [Zucker2013CHOMPCH, Marinho2016FunctionalGM, Francis2019FastSF]
, in that the input is a distribution over trajectories rather than a single trajectory. Our optimisation can not only optimise over trajectory mean, but also over trajectory variances.
Like trajectory optimisation methods [Zucker2013CHOMPCH], we assume the cost operator takes the form of average costs over prediction horizon , where the cost at a given point in time, , is the probability of a collision at a given time. contains weight variables with a distribution parameterised by . We can then interpret as the average probability of collision over a time horizon, defined as:
(11)  
(12) 
The probability of collision at some time and location is computed by taking the product of the probability of the trajectory reaching the coordinate at the given time and the probability that the coordinate is occupied. We are provided with an occupancy map and the probability a coordinate of interest, , being occupied is denoted as . To evaluate the cost, we could marginalise over the all trajectory parameters, but this may be difficult due to the high dimensionality of , instead work in the 2D world space, where the environmental constraints are defined. As is constructed by taking the dot product of weight vectors and feature vectors, trajectory coordinate distribution in world space at time is:
(13)  
(14) 
By the equivalences between matrix normal and multivariate normal distributions [Dawid1981SomeMD], we have a mixture of twodimensional multivariate normal distributions in worldspace,
. This is the probability distribution over trajectory coordinates at time
, . The mixture means and covariances of each component are defined as and Substituting eq. 14 into the inner integral of eq. 12, gives the cost operator equation at given time:(15) 
Note that and are dependent on . During the optimisation, we assume that each mode can be optimised independently and that the importance of each mode, represented by , is accurately captured by the learning model and does not change during trajectory optimisation. We then define the set of parameters to optimise as . The integral defined in eq. 15 may be intractable to evaluate analytically, so we aim to find an approximation to it. The coordinate distribution at a given time in world space,
, is a mixture of bivariate Gaussian distributions. We use GaussHermite quadrature, suitable for numerically integrating exponentialclass functions, to approximate
eq. 15. Quadrature methods approximate an integral by taking the weighted sum of integrand values sampled at computed points, called abscissae. We introduce the change of variables, , where the Cholesky factorisation gives . Using GaussHermite quadrature schemes, we have:(16) 
where we find the values of as abscissa obtained from roots of Hermite polynomials, and , are the associated quadrature weights. Details of abscissae and weight calculations, as well as a review on multidimensional GaussHermite quadrature, can be found in [GaussHermite]
. An example of abscissae points when estimating a multivariate Gaussian is shown in
fig. 4. Next, we evaluate the outer integral in eq. 12 using rectangular numerical integration for . With defined, we are in a position to solve the combined learning and constrained optimisation problem, defined earlier in eq. 4, using a nonlinear optimisation solver, such as the sequential least squares optimiser SLSQP [slsqp]. Analytical gradients with respect to occupancy is provided when using a continuous differential occupancy map such as those introduced in [HilbertMap, icra_wzhi2019]. By utilising the equivalence between matrix normal distributions and multivariate Gaussians [MLEMatN], the KL divergence term between and components can be computed [BishopBook]. After obtaining the solution of the optimisation we have the optimised distribution of . By taking dot products between and , we can construct the constraintcompliant trajectory distribution of futures. Individual realised trajectories can then be generated from .Although we have focused on collision constraints with respect to environment structure, the optimisation allows for constraints on velocity or acceleration. The continuous nature of our trajectories allows time derivatives of trajectories to be obtained. Specifically, the derivative reproducing property, ensures the derivatives of smooth kernels also correspond to kernels [DerivativeRKHS]. We can reason about the order derivatives of displacement by replacing with .
V Experimental Evaluation
We empirically evaluate (1) the ability of our learning component to learn probabilistic representations of future trajectories to provide good prior trajectories distributions; (2) the ability of the proposed framework to enforce collision constraints, and its effect on trajectory quality.
Va Datasets and Metrics
We run our experiments on one simulated dataset, and two realworld datasets, including:

[leftmargin=*]

Simulated dataset (Simulated): This contains simulated trajectories on a floorplan;

THÖR dataset [thorDataset2019] (Indoor): This dataset contains real human trajectories walking in a room with three large obstacles in the center of the room.

Lankershim dataset (Traffic): This dataset contains real vehicle data along a long boulevard. We use a particular busy intersection between and .
Throughout each of our experiments, we ensure that the observed trajectory conditioned on contains 10 timesteps. The prediction horizons, , for the simulated, indoor and traffic datasets are 15, 10, and 20 timesteps respectively. We split long trajectories, such that we have 200 pairs of partial trajectories in the simulated dataset, while the indoor dataset contained 871 pairs of partial trajectories, and the traffic dataset contained 6175 pairs. Trajectories that were shorter than the length of the prediction horizon were filtered out. A traintotest ratio of 80:20 was used for each dataset. For our learning model, we set the number of mixture components , hyperparameters , and for the simulated, indoors, and traffic datasets respectively. Maps for the simulated and indoor datasets are available, and we construct a map for the traffic dataset by considering the road positions. Our occupancy map is represented as a continuous Hilbert Map [HilbertMap], giving smooth occupancy probabilities over coordinates, and occupancy gradients.
Metrics used in the quantitative evaluation include: (i) Average displacement error (ADE): The average euclidean distance error between the expectation of the nearest distribution component of trajectories with ground truth trajectory; (iii) Final displacement error (FDE): The euclidean distance error at the end of time horizon between the expectation of the nearest distribution component of trajectories with ground truth trajectory. (iii) Average Likelihood (AL): To take into account the uncertain nature of motion predict, for probabilistic models, we record the likelihood of drawing the trajectory averaged over the timesteps. Note that likelihoods are difficult to interpret and not comparable over different datasets. Lower ADE and FDE, and higher AL indicate better performance. For the evaluation of collisions, we also record the percentage of constraintviolating trajectory distributions in test set.
VB Learning ContinuousTime Trajectory Distributions as Priors
The trajectory distribution learning component of our framework provides a prior distribution for the optimisation. We examine whether the learning component of our framework is able to learn distributions of future trajectories from data, comparing the performance with several benchmark models. The benchmark models are: (1) Kernel trajectory maps (KTMs) [wzhiKTM], (2) Constant velocity (CV) model, and (3) Naive neuralnetwork (NN) model, where we learn a mapping between past trajectories and future trajectories by minimising the mean squared error loss. The neural network comprised of Reluactivated fullyconnect layers with number of neurons: , where gives the trajectory time horizon.
We see that the learning component of our framework is able to learn high quality distributions of trajectories. Figure 5 shows examples of extrapolated trajectory distributions in red, and the observed trajectory in blue. We see that the learned distribution is relatively close to the ground truth green trajectory, and is able to capture the uncertain nature of motion prediction. Although a nonnegligible number of sampled trajectories collide with the environment. Table I summarises the performance results of our method against our benchmarks. We see that our learning model is comparable to the method proposed in [wzhiKTM], while outperforming a constant velocity and naive neural network model. While the KTM method performs slightly better on the traffic dataset, it is purely learningbased and lacks mechanisms to enforce constraints. We note that our method encodes the observed trajectory by simply flattening the input coordinates, as described in section IVC. More sophisticated methods of encoding the observed trajectory (such as with LSTM networks [LSTM]), independent of our outlined output of trajectory distributions, could improve the performance of the learning model.
ADE  FDE  AL  

Simulated  Ours  1.29  2.27  0.12 
KTM [wzhiKTM]  1.44  2.41  0.11  
CV  4.27  11.66    
NNnaive  2.11  6.07    
Indoors  Ours  0.94  1.32  3.32 
KTM  0.65  1.26  2.88  
CV  2.08  4.66    
NNnaive  1.84  3.35    
Traffic  Ours  4.99  7.24  0.04 
KTM  4.98  7.03  0.043  
CV  5.14  10.38    
NNnaive  21.1  39.5   
VC Enforcing Occupancy Compliance via Trajectory Optimisation
After obtaining a prior from our learned model, we apply constraints on the prediction via trajectory optimisation. We will evaluate whether collision constraints can be enforced, and their effect on trajectory quality. We define the constraint such that the timeaveraged probability of collision to be less than or equal to 0.05, . The SLSQP optimiser was able to solve the optimisation quickly, under half a second, with a python implementation. We obtain trajectory distribution priors from our learned model, and select all of the observations where the learned prior trajectory is constraintviolating. We evaluate the performance of the trajectory priors, and compare them with the trajectory distribution predictions after enforcing constraints. The quantitative results are listed in table II. We see after solving with structural constraints, all predictions are feasible. Additionally, the trajectory distribution quality improves after enforcing constraints for all the datasets considered.
ADE  FDE  AL  C.V.P  

Simulated  Unoptimised  1.48  2.87  0.11  40% 
Optimised  1.28  2.33  0.17  0%  
Indoors  Unoptimised  1.62  2.41  3.42  15.4% 
Optimised  1.44  2.38  3.3  0%  
Traffic  Unoptimised  6.56  11.81  0.034  5.7% 
Optimised  5.14  8.95  0.043  0% 
Qualitative results are shown in fig. 6. We see the abscissae points of the largest trajectory distribution component before (top) and after (bottom) the trajectory optimisation of examples in each dataset. Our framework is able to ensure that the distributions comply with environmental structure. Notably, we see that the trajectory optimisation not only adjusts the mean of the trajectory distribution, but also optimises the variances, giving more robust trajectory distributions. This effect is most prominent in the left subfigure, where the predicted distribution of trajectories in a simulated hallway recovers a much tighter variance which follows the environment structure. We also observe instances collisionavoidance with objects in open space (demonstrated in middle subfigure), as well as increased adherence to road structure (right subfigure).
Vi Conclusion
We introduced a novel framework to learn motion predictions from observations while conforming to constraints, such as obstacles. Our framework consists of a learning component which learns a distribution over future trajectories. This distribution is used as a prior in a trajectory optimisation step which enforces chance constraints on the trajectory distribution. We empirically demonstrate that our framework can learn complex trajectory distributions and enforce compliance with environment structures via optimisation. This results in reduced variance and the avoidance of obstacles by the predicted trajectories, leading to improved prediction quality.
Comments
There are no comments yet.