1 Introduction
Autonomous agents may be required to operate in environments with moving objects, such as pedestrians and vehicles in urban areas, for extended periods of time. A probabilistic model that captures the movement of surrounding dynamic objects allows an agent to make more effective and robust plans. This work presents Kernel Trajectory Maps (KTM), that capture the multimodal, probabilistic, and continuous nature of future paths. Given a sequence of observed waypoints of a trajectory up to a given coordinate, a KTM is able to produce a multimodal distribution over possible future trajectories, represented by a mixture of stochastic processes. Continuous functional trajectories, which are functions mapping queried times to trajectory coordinates, can then be sampled from the output stochastic process.
Early methods to predict future motion trajectories generally extrapolate based on physical laws of motion [1]. Although simple and often effective, these models have the drawback of being unable to make use of other observed trajectories, or account for environment topology. For example, physicsbased methods fail to take into account that trajectories may follow a road that exists in a map. To address this shortcoming, methods have been developed that map the direction or flow of movements in an environment in a probabilistic manner [2, 3, 4]. These methods are able to output distributions over future movement directions or velocities, conditioned on the current queried coordinate. Using these models, one can sequentially forward sample to obtain a trajectory. This forward sampling approach makes the Markov assumption, assuming that the object dynamics only depend on the current position of the object. These approaches discard useful information from the trajectory history, and can accumulate errors from the forward simulation.
Motivated to overcome these aforementioned limitations of past methods, we utilise distance substitution kernels [5, 6] with the Fréchet distance [7, 8, 9]
to project trajectory data onto a representative set of trajectories to obtain high dimensional projection features. Using a neural network with a single hidden layer with the projection features, we learn a multimodal mixture of stochastic processes. The resulting mixture is also a stochastic process, and can be viewed as a distribution over functions, where each realisation is a continuous functional trajectory. Figure
1shows observed trajectories and realised trajectory predictions, demonstrating the probabilistic and multimodal nature of KTMs. The probabilistic nature of the output provides an estimate for uncertainty, which can be used for robust planning and decision making. We contribute the KTM, a method that:

is trajectory history aware and captures dependencies over the entire trajectory;

models the output as a mixture of stochastic process, providing a multimodal distribution over possible trajectories;

represents realised trajectories as continuous functions, allowing them to be queried at arbitrary time resolution.
2 Related Work
Kernel Trajectory Maps (KTMs) learn motion patterns in an environment, and represent sampled outputs as continuous trajectories. i.e. trajectories that can be queried at arbitrarily fine time resolutions. Here we briefly revisit literature on modelling motion dynamics and continuous trajectories.
2.1 Motion Modelling
Some of the simplest approaches to model trajectory patterns are kinematic models that make extrapolations based on a sequence of observed coordinates. Popular examples include the constant velocity and constant acceleration models [10]. Some other attempts to understand dynamics take the approach of extending occupancy mapping beyond static environments by building occupancy representations along time [11, 12, 13]. This approach tends to be memory intensive, limiting scalability.
Other recent approaches have incorporated global spatial [2, 3, 4] and temporal information [14, 15]. The authors of [2] propose Directional grid maps, a model that learns the distribution of motion directions in each grid cell of a discretised environment. This is achieved by fitting a mixture of vonMises distributions on the motion directions attributed to each cell. A similar method is also presented in [3]
, where a map of velocity distributions in the environment is modelled by semiwrapped Gaussian mixture models. Both methods are able to capture the uncertainty of motion at a given point coordinate, but require forward sampling to obtain trajectories.
2.2 Continuous Trajectories
Continuous representations of trajectories, often modelled by a Gaussian processes [16] or a sparse low rank approximations of Gaussian processes [17], have arisen in previous works for trajectory estimation [18] and motion planning [19, 20, 21]
. In this work, we also formulate a method to produce continuous trajectories, and then leverage continuous trajectories for extrapolation, rather than the estimation and interpolation problems addressed in previous works.
3 Methodology
3.1 Problem Formulation and Overview
We work with continuous trajectory outputs, , and discrete trajectories inputs, . Discrete trajectories are an ordered set of waypoint coordinates indexed by time, . Continuous trajectories, , are functions that map time to coordinates. Continuous trajectories can be discretised by querying at time steps, . In this paper, continuous trajectories, , are defined by weighted combinations of features, , where contains the weight parameters. is dependent on the queried time. We shall discuss continuous trajectories in detail in subsection 3.3.
Given a dataset of pairs of trajectories, , where is an observed input trajectory, and
is a target trajectory. We seek to predict a probability distribution over possible future trajectories beyond the given observed waypoints,
where is a queried discrete trajectory, is a predicted continuous trajectory starting from the last time step of . To find the distribution over future trajectories, we write the marginal likelihood as,(1) 
To evaluate the marginal likelihood, we learn and sample realisations of weights to conduct inference (detailed in subsection 3.5). This learning can be summarised by the following steps:
3.2 Generating Projection Features from Discrete Trajectories
In this subsection, we describe the conversion from discrete input trajectories to high dimensional kernel projections. We make use of distance substitute kernels [22, 5, 6], which are defined as , for kernel function, , and distance measure that is symmetric, i.e. , and has zero diagonal, i.e. . In this work, we use the discrete Fréchet distance [8]
substituted in a radial basis function (RBF) kernel. The Fréchet distance
[7] between curves is defined as,(2) 
where are parameterisations of two curves, and range over all continuous and monotone increasing functions. We use a discrete approximation of the Fréchet distance, which provides a distance metric between ordered sets of arbitrary length. An algorithm to compute the discrete Fréchet distance is outlined in [8]. We name this kernel the discrete Fréchet (DF) kernel. It is given by:
(3) 
where and are discrete trajectories, which can be of different lengths; is the length scale parameter of the RBF kernel; is the discrete Fréchet distance.
We project each observed trajectory with DFkernel onto a set of representative trajectories. We obtain , a vector of projections from onto . A set of representative trajectories, , are selected from the set of all observed input trajectories. An alternative view of this process is placing basis functions over representative trajectories. The corresponding high dimensional features over all observations are given by,
(4) 
To build a KTM, we will input the highdimension projection features to a simple neural network model. We do not operate on the kernel matrix. This can be viewed as learning combinations of fixed basis functions, and is similar to Sparse Gaussian Process Regression [17], and Generalised Nyström Kernel Regression [23]. Sampling a set of representative trajectories can be done in a similar manner to sampling inducing points for the Nyström Method [24], as studied in [25, 26, 27].
Even though randomly selecting a subset of trajectories from the observed trajectories is sufficient, we outline a quick and simple sampling scheme, similar to the leverage score sampling method [25]. Provided a square matrix of the discrete Fréchet distances between all trajectories in a dataset of observation, , sort the columns of the matrix by its norm, and uniformly sample, with a fixed stepsize, the columns of the matrix. The corresponding trajectory of each column selected is added to the representative set. Here we assume that similar trajectories would have roughly similar norms. Hence, our sampling scheme discourages selecting multiple almost identical representative trajectories, and encourages selecting a more diverse set of representations.
The projected feature vectors generated are highdimensional representations of our input observations or waypoints. Whereas continuous output trajectories sampled from a KTM are in a concise functional form. The details for constructing continuous functional trajectories are described in the next subsection.
3.3 Constructing Continuous Functional Trajectories
The conversion of target trajectories from ordered sets of coordinates to parameterised functions can be viewed as finding a concise low dimensional representation of discrete trajectories. We assume that each output trajectory comprises of conditionally independent functions, and , that model the coordinates of the trajectory over time . and give coordinates relative to the last waypoint coordinate of the queried discrete trajectory.
A target trajectory recorded from time to , , is represented as weighted sums of projections to Gaussian basis functions placed at fixed times, , where represents the features, and ,
are weights. The weights parameters are found by solving kernel ridge regression problems with constraints that at
, the trajectory is at the origin:(5a)  
s.t.  (5b) 
(6a)  
s.t.  (6b) 
where is a regularisation coefficient, and is a feature map defined by,
(7) 
where is a set of, , fixed inducing points in time, and is a length scale parameter that controls the width of the Gaussian bases, centered at each inducing point of time. Note that , projects to inducing points in time, and , projects to representative trajectories. By including equations 5b and 6b as squared penalty terms, with penalty coefficient , to equations 5a and 6a, and taking derivatives then setting to zero gives the solution to the minimisation problems,
(8)  
We can solve the minimisation problem to obtain vector of weights, and , that parameterise the function and respectively. In this work, we define the same set of inducing points for and , so both and are of dimensionality , as there is a weight for each basis.
3.4 Learning a Mixture of Stochastic Processes
We extend our functional representation of trajectories to stochastic processes, akin to distributions over functions. To model stochastic processes and , we fit distributions over the weight parameters of and . Namely, we wish to find the probability distribution, , where is a vector containing both and , is a queried trajectory. We consider the concatenation of vectors and , which has elements. To permit multiple modes over the mean function, assume and can be expressed as a linear sum of individual stochastic processes, which we shall call components. We can express as a linear sum with mixture coefficients , where . Each is a function on , the projections of via the DFkernel, detailed in subsection 3.2. Defining the shorthand , we have,
(9) 
In this work, we approximate the probability distribution of weights of each component, given a queried trajectory
, to be independent Gaussian distributions. The mean,
, and standard deviations,
, of the weight of the component are functions of . For brevity, we use the shorthand and . For the weight of the component, we have . Assuming weights are independent, the conditional probability over the vector of weights, of each component , is. We subsequently derive a loss function to learn
, , and , for all and .Let us consider the set of observations of input and target trajectories, . At the observation, is projected using the DFkernel to obtain high dimensional projections, , and weights, , that parameterise each , a continuous representation of , is found by evaluating equation 8. Assuming that observations are independent and identically distributed, we can write the conditional density as,
(10) 
Fitting the conditional probabilities over weight parameters can be done by maximising 10. We define the loss function as,
(11)  
(12) 
Constraints
can be enforced by applying a softmax activation function,
, where denotes the network outputs of . To enforce , an exponential activation function, , is applied to the network outputs corresponding to standard deviation. By utilising the expressiveness of our projection features, a simple mixture density network [28, 29], with a single hidden layer can then be used to learn the functions of parameters , ,, by minimising our loss function via Stochastic Gradient Descent (SGD).
3.5 Conducting Inference and Obtaining Trajectory Realisations
After learning the functions , , and as described in subsection 3.4, we have via equation 9, and the assumption of independent Gaussian distributed weights. Given a vector of feature maps, , to evaluate , we have [30], where denotes the standard deviation of the observation error. It is possible to find via
. Markov chain Monte Carlo methods
[31] may be required to evaluate the multimodal distribution, , as it requires marginalising out , and . Like [32] and [33], in this work, we focus on the deterministic observation case, where .The inference process to sample continuous trajectories is outlined in algorithm 1. Under the assumption of deterministic observations, we evaluate , by randomly sampling , and obtaining realisations of continuous trajectories by evaluating . We can obtain a discrete trajectory by querying at times, , i.e. .
4 Experiments and Discussions
We wish to highlight the benefits KTMs bring. In particular: (1) mapawareness; (2) trajectory history awareness; (3) multimodal probabilistic predictions, with continuous trajectory realisations.
4.1 Experimental Setup
In the following subsections, we shall run experiments on both simulated and reallife trajectory datasets, including:

Simulated dataset (S): Simulated trajectories of pedestrians crossing a road, similar to the simulated datasets used in [2]

Edinburgh dataset [34] (E): Pedestrian trajectories in the realworld on September

Lankershim dataset [35] (L): Subset of valid vehicle trajectories in the region between xcoordinates mm and ycoordinates mm
We use mixture components, and length scale , for the DFkernel, and for the Gaussian bases. Gaussian bases are placed evenly at 2.5 time step intervals for the Edinburgh dataset and 5 for the simulated and Lankershim datasets. Half the trajectories are used as representative trajectories. To adequately evaluate the ability of KTMs, we ensure representative trajectories are not included in testing. We randomly select 20% of trajectories outside of the representative set as test examples, or
of the total. To account for stationary vehicles, for the Lankershim dataset, we only evaluate trajectories that move more than 20m in 20 time steps. We train for 80 epochs, then evaluate on the test set. Experiments are repeated 5 times, each with randomly selected test examples. We give quantitative results on the following realised trajectories from the output:

KTMWeighted Average (KTMW): A linear combination of the mean of each mixture component, weighted by the mixture coefficient;

KTMClosest (KTMC): The mean trajectory of the mixture component that is the closest to ground truth. Selecting the trajectory in this manner assumes the decision of which option, out of the four possible trajectories to take, is made correctly. This allows us to evaluate the quality of the predicted trajectory, without taking into account of the quality of decisionmaking;

Constant Velocity (CV): The trajectory is generated by a model that the velocity remains constant beyond the observations;

Directional Grid Maps (DGM): Directional grid map [2] is a recent method capable of producing directional predictions. We conduct forward sampling on a DGM, with a step size equal to that of the last observed step.
The metrics used to evaluate our trajectories are: (1) Euclidean distance (ED) between the end points of the predicted and ground truth trajectories; (2) Discrete Fréchet distance (DF) [8, 9] between the predicted and ground truth trajectories. Continuous trajectories are discretised for comparisons. The unit used is metres.
4.2 MapAwareness
Kernel Trajectory Maps learn to predict trajectories from a dataset of observed trajectories, which contain rich information about the structure of the environment, such as obstacles and designated paths. Methods that learn from a set of observed trajectories are intrinsically mapaware [10], and can account for environment geometry. Dynamics based models are often mapunaware, and are not able to anticipate a future changes in direction due to environmental factors, such as a turning road.
An example of mapawareness is demonstrated in figure 2. We sample realisations from the predicted mixture of a KTM, and compare it against the constant velocity (CV) model and ground truth trajectory. The sharp turn the ground truth trajectory takes is due to the road structure in the dataset, and there is little indication from the behaviour of the observed trajectory. The turn is not captured by the CV model, but is captured by the KTM.
Table 1 contains the quantitative results of methods described in subsection 4.1. We predict future trajectories over a horizon of 20 time steps. We see that mapaware methods, such as KTM and DGM, tend to outperform the CV model. Notably the CV model performs strongly for the Lankershim dataset, outperforming all but the KTMC method, due to vehicle trajectories in that dataset being approximately constant velocity over small distances. The Edinburgh dataset contains pedestrian motion trajectories which are much more unstructured and unrestricted. Thus, KTMC and KTMW perform significantly stronger than the CV model.
KTMC  KTMW  CV  DGM  

(S)  ED  1.30.1  1.80.2  6.50.3  4.40.1 
DF  1.40.1  1.90.1  6.30.3  4.40.1  
(E)  ED  0.70.1  0.90.1  1.40.1  1.10.1 
DF  0.80.1  0.90.1  1.40.1  1.10.1  
(L)  ED  5.80.3  11.50.2  11.30.2  11.40.2 
DF  6.30.3  11.50.5  10.70.2  11.40.2 
4.3 Trajectory History Awareness
Recent attempts to encode multimodal directional distributions in a map [2, 3, 4] largely condition only on the most recent coordinate, and are unable to utilise the full trajectory history of the object. KTMs are trajectory history aware, as demonstrated by trajectories sampled from a KTM trained on the simulated dataset, shown in figure 4. The predicted trajectories sampled vary significantly, though the positions of the last observed location are similar. Methods that condition solely on the most recent coordinate, can not differentiate between the two observed trajectories. The latter portion of the observed trajectories are similar, but with dissimilar early portions. By exploiting DFkernels, KTMs give predictions conditioned on the entire trajectory. Although directional flow methods, such as DGM [2], are able to capture the general movement directions of dynamic objects, trajectories can only be obtained by making the Markov assumption and forward sampling. This process is sensitive to errors, and recursive behaviour can also arise. For example, the prediction at A points to B, which in turn has predictions pointing back to A. KTMs allow for realisations of entire trajectories, without forward sampling or making Markovian assumptions. The trajectory history awareness of KTMs explain the strong performance of KTMs relative to the DGM method, specifically on the simulated dataset, as shown in table 1.
4.4 Multimodal Probabilistic Continuous Outputs
KTMs output mixtures of stochastic processes, corresponding to multimodal distributions over functions. This provides us with a wealth of information, including groups of possible future trajectories with associated uncertainty. Figure 3 illustrates sampling functions from the outputted mixtures. The left and center plots show realisations of the functions and , and the right plot shows the corresponding trajectory. There is clear multimodality in the distribution over future trajectories.
A major benefit of KTMs is that realisations of the output are continuous functional trajectories. These are smooth and continuous, and do not commit to an a priori resolution. We can query any time value to retrieve predicted coordinates at the given time point. The functional representation with Gaussian bases is inherently smooth, allowing us to operate on the derivatives of displacement. This property permits us to constrain certain velocity, acceleration, or jerk values.
5 Conclusion
In this paper, we introduce Kernel Trajectory Maps (KTM), a novel multimodal probabilistic motion prediction method. KTMs are mapaware and condition on the whole observed trajectory. By projecting on a set of representative trajectories using expressive DFkernels, we can use a simple single hidden layer mixture density network to arrive at a mixture of stochastic processes, equivalent to a multimodal distribution over future trajectories. Each realisation of the mixture is a continuous trajectory, and can be queried at any time resolution. We recover whole trajectories without resorting to forward sampling coordinates. Empirical results show the awareness of the map and trajectory history improves performance when compared to a CV and mapaware, but not trajectory history aware, DGM model. The multimodal and probabilistic properties of KTMs are also apparent from the experimental results.
References
 Rong Li and Jilkov [2003] X. Rong Li and V. P. Jilkov. Survey of maneuvering target tracking. part i. dynamic models. IEEE Transactions on Aerospace and Electronic Systems, 2003.
 Senanayake and Ramos [2018] R. Senanayake and F. Ramos. Directional grid maps: modeling multimodal angular uncertainty in dynamic environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018.
 Kucner et al. [2017] T. P. Kucner, M. Magnusson, E. Schaffernicht, V. H. Bennetts, and A. J. Lilienthal. Enabling flow awareness for mobile robots in partially observable environments. IEEE Robotics and Automation Letters, 2017.
 McCalman et al. [2013] L. McCalman, S. O’Callaghan, and F. Ramos. Multimodal estimation with kernel embeddings for learning motion models. In IEEE International Conference on Robotics and Automation, 2013.
 Haasdonk and Bahlmann [2004] B. Haasdonk and C. Bahlmann. Learning with distance substitution kernels. In DAGMSymposium, Lecture Notes in Computer Science, 2004.
 Schölkopf [2000] B. Schölkopf. The kernel trick for distances. In Advances in Neural Information Processing Systems, 2000.
 Fréchet [1906] M. M. Fréchet. Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo (18841940), 1906.
 Eiter and Mannila [1994] T. Eiter and H. Mannila. Computing discrete fréchet distance. Technical report, 1994.
 Besse et al. [2016] P. C. Besse, B. Guillouet, J. Loubes, and F. Royer. Review and perspective for distancebased clustering of vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems, 2016.
 Rudenko et al. [2019] A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila, and K. O. Arras. Human motion trajectory prediction: A survey. CoRR, 2019.

Arbuckle et al. [2002]
D. Arbuckle, A. Howard, and M. Mataric.
Temporal occupancy grids: a method for classifying the spatiotemporal properties of the environment.
In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2002.  Kucner et al. [2013] T. Kucner, J. Saarinen, M. Magnusson, and A. J. Lilienthal. Conditional transition maps: Learning motion patterns in dynamic environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013.
 Wang et al. [2014] Z. Wang, R. Ambrus, P. Jensfelt, and J. Folkesson. Modeling motion patterns of dynamic objects by iohmm. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014.
 Krajník et al. [2017] T. Krajník, J. P. Fentanes, J. M. Santos, and T. Duckett. Fremen: Frequency map enhancement for longterm mobile robot autonomy in changing environments. IEEE Transactions on Robotics, 2017.
 Molina et al. [2018] S. Molina, G. Cielniak, T. Krajník, and T. Duckett. Modelling and predicting rhythmic flow patterns in dynamic environments. In Towards Autonomous Robotic Systems, 2018.

Rasmussen [2004]
C. E. Rasmussen.
Gaussian Processes in Machine Learning
. 2004.  Snelson and Ghahramani [2006] E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudoinputs. In Advances in Neural Information Processing Systems, 2006.
 Barfoot et al. [2014] T. Barfoot, C. Tong, and S. Särkkä. Batch continuoustime trajectory estimation as exactly sparse gaussian process regression. In Robotics: Science and Systems Conference, 2014.
 Marinho et al. [2016] Z. Marinho, B. Boots, A. Dragan, A. Byravan, G. J. Gordon, and S. Srinivasa. Functional gradient motion planning in reproducing kernel hilbert spaces. In Robotics: Science and Systems, 2016.
 Francis et al. [2017] G. Francis, L. Ott, and F. Ramos. Stochastic functional gradient for motion planning in continuous occupancy maps. In IEEE International Conference on Robotics and Automation, 2017.
 Mukadam et al. [2018] M. Mukadam, J. Dong, X. Yan, F. Dellaert, and B. Boots. Continuoustime gaussian process motion planning via probabilistic inference. The International Journal of Robotics Research, 2018.
 Woznica et al. [2006] A. Woznica, A. Kalousis, and M. Hilario. Distances and (indefinite) kernels for sets of objects. In International Conference on Data Mining (ICDM), 2006.
 Chen et al. [2016] H. Chen, H. Xia, W. Cai, and H. Huang. Error analysis of generalized nyström kernel regression. In Advances in Neural Information Processing Systems, 2016.
 Williams and Seeger [2001] C. K. I. Williams and M. Seeger. Using the nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems, 2001.
 Drineas and Mahoney [2005] P. Drineas and M. W. Mahoney. On the nyström method for approximating a gram matrix for improved kernelbased learning. Journal of Machine Learning Research, 2005.
 Alaoui and Mahoney [2015] A. E. Alaoui and M. W. Mahoney. Fast randomized kernel ridge regression with statistical guarantees. In Advances in Neural Information Processing Systems, 2015.
 Kumar et al. [2012] S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the nystrom method. Journal of Machine Learning Research, 2012.
 Bishop [1994] C. M. Bishop. Mixture density networks. Technical report, Dept. of Computer Science and Applied Mathematics, Aston University, 1994.
 Brando [2017] A. Brando. Mixture density networks (mdn) for distribution and uncertainty estimation. Technical report, Universitat de Barcelona, 2017.
 Bishop [2006] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). 2006.
 Andrieu et al. [2003] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. Machine Learning, 2003.
 De Freitas et al. [2012] N. De Freitas, A. J. Smola, and M. Zoghi. Exponential regret bounds for gaussian process bandits with deterministic observations. In International Coference on International Conference on Machine Learning, 2012.
 Wang et al. [2014] Z. Wang, B. Shakibi, L. Jin, and N. de Freitas. Bayesian multiscale optimistic optimization. In AISTATS, 2014.
 Majecka [2009] B. Majecka. Statistical models of pedestrian behaviour in the forum. Technical report, School of Informatics, University of Edinburgh, 2009.
 Lan [2007] Lankershim boulevard dataset. Technical report, Federal Highway Administration, 2007.
Comments
There are no comments yet.