Kernel Trajectory Maps for Multi-Modal Probabilistic Motion Prediction

by   Weiming Zhi, et al.

Understanding the dynamics of an environment, such as the movement of humans and vehicles, is crucial for agents to achieve long-term autonomy in urban environments. This requires the development of methods to capture the multi-modal and probabilistic nature of motion patterns. We present Kernel Trajectory Maps (KTM) to capture the trajectories of movement in an environment. KTMs leverage the expressiveness of kernels from non-parametric modelling by projecting input trajectories onto a set of representative trajectories, to condition on a sequence of observed waypoint coordinates, and predict a multi-modal distribution over possible future trajectories. The output is a mixture of continuous stochastic processes, where each realisation is a continuous functional trajectory, which can be queried at arbitrarily fine time steps.



There are no comments yet.


page 1

page 2

page 3

page 4


OCTNet: Trajectory Generation in New Environments from Past Experiences

Being able to safely operate for extended periods of time in dynamic env...

BiTraP: Bi-directional Pedestrian Trajectory Prediction with Multi-modal Goal Estimation

Pedestrian trajectory prediction is an essential task in robotic applica...

Multi-Modal Trajectory Prediction of NBA Players

National Basketball Association (NBA) players are highly motivated and s...

Personalized Trajectory Prediction via Distribution Discrimination

Trajectory prediction is confronted with the dilemma to capture the mult...

Modeling continuous-time stochastic processes using N-Curve mixtures

Representations of sequential data are commonly based on the assumption ...

MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction

Predicting human behavior is a difficult and crucial task required for m...

Probabilistic Trajectory Prediction with Structural Constraints

This work addresses the problem of predicting the motion trajectories of...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Autonomous agents may be required to operate in environments with moving objects, such as pedestrians and vehicles in urban areas, for extended periods of time. A probabilistic model that captures the movement of surrounding dynamic objects allows an agent to make more effective and robust plans. This work presents Kernel Trajectory Maps (KTM), that capture the multi-modal, probabilistic, and continuous nature of future paths. Given a sequence of observed waypoints of a trajectory up to a given coordinate, a KTM is able to produce a multi-modal distribution over possible future trajectories, represented by a mixture of stochastic processes. Continuous functional trajectories, which are functions mapping queried times to trajectory coordinates, can then be sampled from the output stochastic process.

Early methods to predict future motion trajectories generally extrapolate based on physical laws of motion [1]. Although simple and often effective, these models have the drawback of being unable to make use of other observed trajectories, or account for environment topology. For example, physics-based methods fail to take into account that trajectories may follow a road that exists in a map. To address this shortcoming, methods have been developed that map the direction or flow of movements in an environment in a probabilistic manner [2, 3, 4]. These methods are able to output distributions over future movement directions or velocities, conditioned on the current queried coordinate. Using these models, one can sequentially forward sample to obtain a trajectory. This forward sampling approach makes the Markov assumption, assuming that the object dynamics only depend on the current position of the object. These approaches discard useful information from the trajectory history, and can accumulate errors from the forward simulation.

Motivated to overcome these aforementioned limitations of past methods, we utilise distance substitution kernels [5, 6] with the Fréchet distance [7, 8, 9]

to project trajectory data onto a representative set of trajectories to obtain high dimensional projection features. Using a neural network with a single hidden layer with the projection features, we learn a multi-modal mixture of stochastic processes. The resulting mixture is also a stochastic process, and can be viewed as a distribution over functions, where each realisation is a continuous functional trajectory. Figure


shows observed trajectories and realised trajectory predictions, demonstrating the probabilistic and multi-modal nature of KTMs. The probabilistic nature of the output provides an estimate for uncertainty, which can be used for robust planning and decision making. We contribute the KTM, a method that:

  1. is trajectory history aware and captures dependencies over the entire trajectory;

  2. models the output as a mixture of stochastic process, providing a multi-modal distribution over possible trajectories;

  3. represents realised trajectories as continuous functions, allowing them to be queried at arbitrary time resolution.

Figure 1: Observed waypoints (blue) and predicted trajectories (green with magenta end-points) sampled from KTM outputs. The ground truth trajectory is indicated in red. The probabilistic and multi-modal nature of KTMs is able to capture the complexity of the motion patterns.

2 Related Work

Kernel Trajectory Maps (KTMs) learn motion patterns in an environment, and represent sampled outputs as continuous trajectories. i.e. trajectories that can be queried at arbitrarily fine time resolutions. Here we briefly revisit literature on modelling motion dynamics and continuous trajectories.

2.1 Motion Modelling

Some of the simplest approaches to model trajectory patterns are kinematic models that make extrapolations based on a sequence of observed coordinates. Popular examples include the constant velocity and constant acceleration models [10]. Some other attempts to understand dynamics take the approach of extending occupancy mapping beyond static environments by building occupancy representations along time [11, 12, 13]. This approach tends to be memory intensive, limiting scalability.

Other recent approaches have incorporated global spatial [2, 3, 4] and temporal information [14, 15]. The authors of [2] propose Directional grid maps, a model that learns the distribution of motion directions in each grid cell of a discretised environment. This is achieved by fitting a mixture of von-Mises distributions on the motion directions attributed to each cell. A similar method is also presented in [3]

, where a map of velocity distributions in the environment is modelled by semi-wrapped Gaussian mixture models. Both methods are able to capture the uncertainty of motion at a given point coordinate, but require forward sampling to obtain trajectories.

2.2 Continuous Trajectories

Continuous representations of trajectories, often modelled by a Gaussian processes [16] or a sparse low rank approximations of Gaussian processes [17], have arisen in previous works for trajectory estimation [18] and motion planning [19, 20, 21]

. In this work, we also formulate a method to produce continuous trajectories, and then leverage continuous trajectories for extrapolation, rather than the estimation and interpolation problems addressed in previous works.

3 Methodology

3.1 Problem Formulation and Overview

We work with continuous trajectory outputs, , and discrete trajectories inputs, . Discrete trajectories are an ordered set of waypoint coordinates indexed by time, . Continuous trajectories, , are functions that map time to coordinates. Continuous trajectories can be discretised by querying at time steps, . In this paper, continuous trajectories, , are defined by weighted combinations of features, , where contains the weight parameters. is dependent on the queried time. We shall discuss continuous trajectories in detail in subsection 3.3.

Given a dataset of pairs of trajectories, , where is an observed input trajectory, and

is a target trajectory. We seek to predict a probability distribution over possible future trajectories beyond the given observed waypoints,

where is a queried discrete trajectory, is a predicted continuous trajectory starting from the last time step of . To find the distribution over future trajectories, we write the marginal likelihood as,


To evaluate the marginal likelihood, we learn and sample realisations of weights to conduct inference (detailed in subsection 3.5). This learning can be summarised by the following steps:

  1. Construct high-dimensional feature vectors of observed discrete trajectories, by projecting to a set of representative trajectories, using discrete Fréchet

    [8] kernels (DF-Kernels). (Subsection 3.2)

  2. Concisely represent each trajectory as a continuous function, defined by a vector of weights and predetermined basis functions. (Subsection 3.3)

  3. Train a single hidden layer mixture density network (MDN) model on the projection features, with weight vectors as targets, to obtain . (Subsection 3.4)

3.2 Generating Projection Features from Discrete Trajectories

In this subsection, we describe the conversion from discrete input trajectories to high dimensional kernel projections. We make use of distance substitute kernels [22, 5, 6], which are defined as , for kernel function, , and distance measure that is symmetric, i.e. , and has zero diagonal, i.e. . In this work, we use the discrete Fréchet distance [8]

substituted in a radial basis function (RBF) kernel. The Fréchet distance

[7] between curves is defined as,


where are parameterisations of two curves, and range over all continuous and monotone increasing functions. We use a discrete approximation of the Fréchet distance, which provides a distance metric between ordered sets of arbitrary length. An algorithm to compute the discrete Fréchet distance is outlined in [8]. We name this kernel the discrete Fréchet (DF) kernel. It is given by:


where and are discrete trajectories, which can be of different lengths; is the length scale parameter of the RBF kernel; is the discrete Fréchet distance.

We project each observed trajectory with DF-kernel onto a set of representative trajectories. We obtain , a vector of projections from onto . A set of representative trajectories, , are selected from the set of all observed input trajectories. An alternative view of this process is placing basis functions over representative trajectories. The corresponding high dimensional features over all observations are given by,


To build a KTM, we will input the high-dimension projection features to a simple neural network model. We do not operate on the kernel matrix. This can be viewed as learning combinations of fixed basis functions, and is similar to Sparse Gaussian Process Regression [17], and Generalised Nyström Kernel Regression [23]. Sampling a set of representative trajectories can be done in a similar manner to sampling inducing points for the Nyström Method [24], as studied in [25, 26, 27].

Even though randomly selecting a subset of trajectories from the observed trajectories is sufficient, we outline a quick and simple sampling scheme, similar to the leverage score sampling method [25]. Provided a square matrix of the discrete Fréchet distances between all trajectories in a dataset of observation, , sort the columns of the matrix by its norm, and uniformly sample, with a fixed stepsize, the columns of the matrix. The corresponding trajectory of each column selected is added to the representative set. Here we assume that similar trajectories would have roughly similar norms. Hence, our sampling scheme discourages selecting multiple almost identical representative trajectories, and encourages selecting a more diverse set of representations.

The projected feature vectors generated are high-dimensional representations of our input observations or waypoints. Whereas continuous output trajectories sampled from a KTM are in a concise functional form. The details for constructing continuous functional trajectories are described in the next subsection.

3.3 Constructing Continuous Functional Trajectories

The conversion of target trajectories from ordered sets of coordinates to parameterised functions can be viewed as finding a concise low dimensional representation of discrete trajectories. We assume that each output trajectory comprises of conditionally independent functions, and , that model the coordinates of the trajectory over time . and give coordinates relative to the last waypoint coordinate of the queried discrete trajectory.

A target trajectory recorded from time to , , is represented as weighted sums of projections to Gaussian basis functions placed at fixed times, , where represents the features, and ,

are weights. The weights parameters are found by solving kernel ridge regression problems with constraints that at

, the trajectory is at the origin:

s.t. (5b)
s.t. (6b)

where is a regularisation coefficient, and is a feature map defined by,


where is a set of, , fixed inducing points in time, and is a length scale parameter that controls the width of the Gaussian bases, centered at each inducing point of time. Note that , projects to inducing points in time, and , projects to representative trajectories. By including equations 5b and 6b as squared penalty terms, with penalty coefficient , to equations 5a and 6a, and taking derivatives then setting to zero gives the solution to the minimisation problems,


We can solve the minimisation problem to obtain vector of weights, and , that parameterise the function and respectively. In this work, we define the same set of inducing points for and , so both and are of dimensionality , as there is a weight for each basis.

3.4 Learning a Mixture of Stochastic Processes

We extend our functional representation of trajectories to stochastic processes, akin to distributions over functions. To model stochastic processes and , we fit distributions over the weight parameters of and . Namely, we wish to find the probability distribution, , where is a vector containing both and , is a queried trajectory. We consider the concatenation of vectors and , which has elements. To permit multiple modes over the mean function, assume and can be expressed as a linear sum of individual stochastic processes, which we shall call components. We can express as a linear sum with mixture coefficients , where . Each is a function on , the projections of via the DF-kernel, detailed in subsection 3.2. Defining the shorthand , we have,


In this work, we approximate the probability distribution of weights of each component, given a queried trajectory

, to be independent Gaussian distributions. The mean,

, and standard deviations,

, of the weight of the component are functions of . For brevity, we use the shorthand and . For the weight of the component, we have . Assuming weights are independent, the conditional probability over the vector of weights, of each component , is

. We subsequently derive a loss function to learn

, , and , for all and .

Let us consider the set of observations of input and target trajectories, . At the observation, is projected using the DF-kernel to obtain high dimensional projections, , and weights, , that parameterise each , a continuous representation of , is found by evaluating equation 8. Assuming that observations are independent and identically distributed, we can write the conditional density as,


Fitting the conditional probabilities over weight parameters can be done by maximising 10. We define the loss function as,



can be enforced by applying a softmax activation function,

, where denotes the network outputs of . To enforce , an exponential activation function, , is applied to the network outputs corresponding to standard deviation. By utilising the expressiveness of our projection features, a simple mixture density network [28, 29], with a single hidden layer can then be used to learn the functions of parameters , ,

, by minimising our loss function via Stochastic Gradient Descent (SGD).

3.5 Conducting Inference and Obtaining Trajectory Realisations

After learning the functions , , and as described in subsection 3.4, we have via equation 9, and the assumption of independent Gaussian distributed weights. Given a vector of feature maps, , to evaluate , we have [30], where denotes the standard deviation of the observation error. It is possible to find via

. Markov chain Monte Carlo methods

[31] may be required to evaluate the multi-modal distribution, , as it requires marginalising out , and . Like [32] and [33], in this work, we focus on the deterministic observation case, where .

The inference process to sample continuous trajectories is outlined in algorithm 1. Under the assumption of deterministic observations, we evaluate , by randomly sampling , and obtaining realisations of continuous trajectories by evaluating . We can obtain a discrete trajectory by querying at times, , i.e. .

input : ,, , , ,
output : Realised Continuous Trajectory,
1 begin
       // generate projections with DF-kernel
       Evaluate , , // Find parameters of mixture of SP
       // Sample
       // retrieve continuous trajectory
4 end
Algorithm 1 KTM Inference

4 Experiments and Discussions

We wish to highlight the benefits KTMs bring. In particular: (1) map-awareness; (2) trajectory history awareness; (3) multi-modal probabilistic predictions, with continuous trajectory realisations.

4.1 Experimental Setup

In the following subsections, we shall run experiments on both simulated and real-life trajectory datasets, including:

  1. Simulated dataset (S): Simulated trajectories of pedestrians crossing a road, similar to the simulated datasets used in [2]

  2. Edinburgh dataset [34] (E): Pedestrian trajectories in the real-world on September

  3. Lankershim dataset [35] (L): Subset of valid vehicle trajectories in the region between x-coordinates mm and y-coordinates mm

We use mixture components, and length scale , for the DF-kernel, and for the Gaussian bases. Gaussian bases are placed evenly at 2.5 time step intervals for the Edinburgh dataset and 5 for the simulated and Lankershim datasets. Half the trajectories are used as representative trajectories. To adequately evaluate the ability of KTMs, we ensure representative trajectories are not included in testing. We randomly select 20% of trajectories outside of the representative set as test examples, or

of the total. To account for stationary vehicles, for the Lankershim dataset, we only evaluate trajectories that move more than 20m in 20 time steps. We train for 80 epochs, then evaluate on the test set. Experiments are repeated 5 times, each with randomly selected test examples. We give quantitative results on the following realised trajectories from the output:

  1. KTM-Weighted Average (KTM-W): A linear combination of the mean of each mixture component, weighted by the mixture coefficient;

  2. KTM-Closest (KTM-C): The mean trajectory of the mixture component that is the closest to ground truth. Selecting the trajectory in this manner assumes the decision of which option, out of the four possible trajectories to take, is made correctly. This allows us to evaluate the quality of the predicted trajectory, without taking into account of the quality of decision-making;

  3. Constant Velocity (CV): The trajectory is generated by a model that the velocity remains constant beyond the observations;

  4. Directional Grid Maps (DGM): Directional grid map [2] is a recent method capable of producing directional predictions. We conduct forward sampling on a DGM, with a step size equal to that of the last observed step.

The metrics used to evaluate our trajectories are: (1) Euclidean distance (ED) between the end points of the predicted and ground truth trajectories; (2) Discrete Fréchet distance (DF) [8, 9] between the predicted and ground truth trajectories. Continuous trajectories are discretised for comparisons. The unit used is metres.

4.2 Map-Awareness

Kernel Trajectory Maps learn to predict trajectories from a dataset of observed trajectories, which contain rich information about the structure of the environment, such as obstacles and designated paths. Methods that learn from a set of observed trajectories are intrinsically map-aware [10], and can account for environment geometry. Dynamics based models are often map-unaware, and are not able to anticipate a future changes in direction due to environmental factors, such as a turning road.

An example of map-awareness is demonstrated in figure 2. We sample realisations from the predicted mixture of a KTM, and compare it against the constant velocity (CV) model and ground truth trajectory. The sharp turn the ground truth trajectory takes is due to the road structure in the dataset, and there is little indication from the behaviour of the observed trajectory. The turn is not captured by the CV model, but is captured by the KTM.

Table 1 contains the quantitative results of methods described in subsection 4.1. We predict future trajectories over a horizon of 20 time steps. We see that map-aware methods, such as KTM and DGM, tend to outperform the CV model. Notably the CV model performs strongly for the Lankershim dataset, outperforming all but the KTM-C method, due to vehicle trajectories in that dataset being approximately constant velocity over small distances. The Edinburgh dataset contains pedestrian motion trajectories which are much more unstructured and unrestricted. Thus, KTM-C and KTM-W perform significantly stronger than the CV model.

Figure 2: 1000 sampled trajectories from output mixture (green) anticipate the sharp turn, as shown by the ground truth (red). There is little indication of the turn from the observed waypoints (blue). The CV (yellow) model does not. The end points of the sampled trajectories are marked in magenta
(S) ED 1.30.1 1.80.2 6.50.3 4.40.1
DF 1.40.1 1.90.1 6.30.3 4.40.1
(E) ED 0.70.1 0.90.1 1.40.1 1.10.1
DF 0.80.1 0.90.1 1.40.1 1.10.1
(L) ED 5.80.3 11.50.2 11.30.2 11.40.2
DF 6.30.3 11.50.5 10.70.2 11.40.2
Table 1: The performance of KTMs compared to a baseline CV model and an map-aware DGM model [2], on the Simulated dataset (S), the Edinburgh dataset (E), and the Lankershim dataset (L). We see that KTM-C outperforms the other methods, while KTM-W also gives a strong performance. KTMs benefit from map and trajectory history awareness. Note that the CV model performs well on the Lankershim dataset (L) due to the vehicle trajectories being approximately constant velocity of relatively short time horizons. Results given in meters

4.3 Trajectory History Awareness

Recent attempts to encode multi-modal directional distributions in a map [2, 3, 4] largely condition only on the most recent coordinate, and are unable to utilise the full trajectory history of the object. KTMs are trajectory history aware, as demonstrated by trajectories sampled from a KTM trained on the simulated dataset, shown in figure 4. The predicted trajectories sampled vary significantly, though the positions of the last observed location are similar. Methods that condition solely on the most recent coordinate, can not differentiate between the two observed trajectories. The latter portion of the observed trajectories are similar, but with dissimilar early portions. By exploiting DF-kernels, KTMs give predictions conditioned on the entire trajectory. Although directional flow methods, such as DGM [2], are able to capture the general movement directions of dynamic objects, trajectories can only be obtained by making the Markov assumption and forward sampling. This process is sensitive to errors, and recursive behaviour can also arise. For example, the prediction at A points to B, which in turn has predictions pointing back to A. KTMs allow for realisations of entire trajectories, without forward sampling or making Markovian assumptions. The trajectory history awareness of KTMs explain the strong performance of KTMs relative to the DGM method, specifically on the simulated dataset, as shown in table 1.

Figure 3: 50 realisations of and (left and center respectively), and the corresponding predicted trajectories (right). and give coordinates relative to the last observed coordinate.
Figure 4: Examples from simulated trajectories are shown (left). All trajectories staring at the lower left terminate at the upper right, and all those starting at the lower right terminate at the upper left. The ground truth of two trajectories we query, one starting at the lower left, the other at the lower right, are shown (center). Realised trajectories from a KTM (right) show that the entire history of observed trajectories are considered. Even though the last observed waypoints of both are similar, the KTM predicts visibly different predictions.

4.4 Multi-modal Probabilistic Continuous Outputs

KTMs output mixtures of stochastic processes, corresponding to multi-modal distributions over functions. This provides us with a wealth of information, including groups of possible future trajectories with associated uncertainty. Figure 3 illustrates sampling functions from the outputted mixtures. The left and center plots show realisations of the functions and , and the right plot shows the corresponding trajectory. There is clear multi-modality in the distribution over future trajectories.

A major benefit of KTMs is that realisations of the output are continuous functional trajectories. These are smooth and continuous, and do not commit to an a priori resolution. We can query any time value to retrieve predicted coordinates at the given time point. The functional representation with Gaussian bases is inherently smooth, allowing us to operate on the derivatives of displacement. This property permits us to constrain certain velocity, acceleration, or jerk values.

5 Conclusion

In this paper, we introduce Kernel Trajectory Maps (KTM), a novel multi-modal probabilistic motion prediction method. KTMs are map-aware and condition on the whole observed trajectory. By projecting on a set of representative trajectories using expressive DF-kernels, we can use a simple single hidden layer mixture density network to arrive at a mixture of stochastic processes, equivalent to a multi-modal distribution over future trajectories. Each realisation of the mixture is a continuous trajectory, and can be queried at any time resolution. We recover whole trajectories without resorting to forward sampling coordinates. Empirical results show the awareness of the map and trajectory history improves performance when compared to a CV and map-aware, but not trajectory history aware, DGM model. The multi-modal and probabilistic properties of KTMs are also apparent from the experimental results.


  • Rong Li and Jilkov [2003] X. Rong Li and V. P. Jilkov. Survey of maneuvering target tracking. part i. dynamic models. IEEE Transactions on Aerospace and Electronic Systems, 2003.
  • Senanayake and Ramos [2018] R. Senanayake and F. Ramos. Directional grid maps: modeling multimodal angular uncertainty in dynamic environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018.
  • Kucner et al. [2017] T. P. Kucner, M. Magnusson, E. Schaffernicht, V. H. Bennetts, and A. J. Lilienthal. Enabling flow awareness for mobile robots in partially observable environments. IEEE Robotics and Automation Letters, 2017.
  • McCalman et al. [2013] L. McCalman, S. O’Callaghan, and F. Ramos. Multi-modal estimation with kernel embeddings for learning motion models. In IEEE International Conference on Robotics and Automation, 2013.
  • Haasdonk and Bahlmann [2004] B. Haasdonk and C. Bahlmann. Learning with distance substitution kernels. In DAGM-Symposium, Lecture Notes in Computer Science, 2004.
  • Schölkopf [2000] B. Schölkopf. The kernel trick for distances. In Advances in Neural Information Processing Systems, 2000.
  • Fréchet [1906] M. M. Fréchet. Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo (1884-1940), 1906.
  • Eiter and Mannila [1994] T. Eiter and H. Mannila. Computing discrete fréchet distance. Technical report, 1994.
  • Besse et al. [2016] P. C. Besse, B. Guillouet, J. Loubes, and F. Royer. Review and perspective for distance-based clustering of vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems, 2016.
  • Rudenko et al. [2019] A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila, and K. O. Arras. Human motion trajectory prediction: A survey. CoRR, 2019.
  • Arbuckle et al. [2002] D. Arbuckle, A. Howard, and M. Mataric.

    Temporal occupancy grids: a method for classifying the spatio-temporal properties of the environment.

    In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2002.
  • Kucner et al. [2013] T. Kucner, J. Saarinen, M. Magnusson, and A. J. Lilienthal. Conditional transition maps: Learning motion patterns in dynamic environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013.
  • Wang et al. [2014] Z. Wang, R. Ambrus, P. Jensfelt, and J. Folkesson. Modeling motion patterns of dynamic objects by iohmm. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014.
  • Krajník et al. [2017] T. Krajník, J. P. Fentanes, J. M. Santos, and T. Duckett. Fremen: Frequency map enhancement for long-term mobile robot autonomy in changing environments. IEEE Transactions on Robotics, 2017.
  • Molina et al. [2018] S. Molina, G. Cielniak, T. Krajník, and T. Duckett. Modelling and predicting rhythmic flow patterns in dynamic environments. In Towards Autonomous Robotic Systems, 2018.
  • Rasmussen [2004] C. E. Rasmussen.

    Gaussian Processes in Machine Learning

  • Snelson and Ghahramani [2006] E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. In Advances in Neural Information Processing Systems, 2006.
  • Barfoot et al. [2014] T. Barfoot, C. Tong, and S. Särkkä. Batch continuous-time trajectory estimation as exactly sparse gaussian process regression. In Robotics: Science and Systems Conference, 2014.
  • Marinho et al. [2016] Z. Marinho, B. Boots, A. Dragan, A. Byravan, G. J. Gordon, and S. Srinivasa. Functional gradient motion planning in reproducing kernel hilbert spaces. In Robotics: Science and Systems, 2016.
  • Francis et al. [2017] G. Francis, L. Ott, and F. Ramos. Stochastic functional gradient for motion planning in continuous occupancy maps. In IEEE International Conference on Robotics and Automation, 2017.
  • Mukadam et al. [2018] M. Mukadam, J. Dong, X. Yan, F. Dellaert, and B. Boots. Continuous-time gaussian process motion planning via probabilistic inference. The International Journal of Robotics Research, 2018.
  • Woznica et al. [2006] A. Woznica, A. Kalousis, and M. Hilario. Distances and (indefinite) kernels for sets of objects. In International Conference on Data Mining (ICDM), 2006.
  • Chen et al. [2016] H. Chen, H. Xia, W. Cai, and H. Huang. Error analysis of generalized nyström kernel regression. In Advances in Neural Information Processing Systems, 2016.
  • Williams and Seeger [2001] C. K. I. Williams and M. Seeger. Using the nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems, 2001.
  • Drineas and Mahoney [2005] P. Drineas and M. W. Mahoney. On the nyström method for approximating a gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 2005.
  • Alaoui and Mahoney [2015] A. E. Alaoui and M. W. Mahoney. Fast randomized kernel ridge regression with statistical guarantees. In Advances in Neural Information Processing Systems, 2015.
  • Kumar et al. [2012] S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the nystrom method. Journal of Machine Learning Research, 2012.
  • Bishop [1994] C. M. Bishop. Mixture density networks. Technical report, Dept. of Computer Science and Applied Mathematics, Aston University, 1994.
  • Brando [2017] A. Brando. Mixture density networks (mdn) for distribution and uncertainty estimation. Technical report, Universitat de Barcelona, 2017.
  • Bishop [2006] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). 2006.
  • Andrieu et al. [2003] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. Machine Learning, 2003.
  • De Freitas et al. [2012] N. De Freitas, A. J. Smola, and M. Zoghi. Exponential regret bounds for gaussian process bandits with deterministic observations. In International Coference on International Conference on Machine Learning, 2012.
  • Wang et al. [2014] Z. Wang, B. Shakibi, L. Jin, and N. de Freitas. Bayesian multi-scale optimistic optimization. In AISTATS, 2014.
  • Majecka [2009] B. Majecka. Statistical models of pedestrian behaviour in the forum. Technical report, School of Informatics, University of Edinburgh, 2009.
  • Lan [2007] Lankershim boulevard dataset. Technical report, Federal Highway Administration, 2007.