Understanding movement trends in dynamic environments is critical for autonomous agents to achieve long-term autonomy. This is further highlighted by the increasing interest in developing autonomous agents capable of coexisting and interacting with humans in a safe and helpful manner, for example, service robots and self-driving vehicles. Humans are adept at anticipating how dynamical objects may move based on the layout of the environment, yet remains non-trivial for robots. Learning to generate likely motion trajectories can allow autonomous agents to anticipate future movement, and better plan in environments with dynamic objects.
Learning motion trajectories requires the development of predictive models that anticipate complex motions and capture the probabilistic and multi-modal nature of such trajectories. Simple motion prediction approaches, such as constant velocity or acceleration models, involve extrapolating a partially observed trajectory to unseen regions. These methods cannot make use of previous trajectories observed in different environments, and are typically not map-aware 
. That is, the predicted motion is dependent exclusively on characteristics of the dynamical object, taking neither established paths nor the environment’s geometry into account. Advances in machine learning have led to the development of methods which learn the general flow of movement[2, 3, 4]. These methods are map-aware, as they learn the behaviour of motion from all trajectories observed in the environment. Map-aware methods which learn entire trajectories instead of flows have also been developed . However, these methods are restricted to being map-specific, and cannot generalise to environments where trajectories have not been observed.
It is often not possible to observe enough trajectories in a particular environment to build specific flow models of trajectories within a short time. On the other hand, building a static occupancy representation of the environment does not require prior observations of motion trajectories. We are motivated to develop a map-aware motion generation method, capable of generalising to new environments where no motion has been observed, but whose occupancy is known. We contribute a probabilistic generative model, Occupancy Conditional Trajectory Network (OCTNet), capable of generating motion trajectories to new environments by generalising trajectories previously observed in alternative environments.
OCTNet is a generative model with the following desire properties:
It generalises motion patterns observed in previous environments, to generates motion trajectories in a new environment, where no motion has been observed;
It effectively models the probabilistic and multi-modal nature of motion generation. Individual trajectories can be generated from the model by sampling from it;
It generates individual trajectories as continuous functions, allowing trajectories to be queried at arbitrary resolution.
Ii Related Work
OCTNet generates likely motion trajectories in an environment. Generating likely motion trajectories have been studied for a long time. Early simple methods to predict motion are often dynamics-based methods which extrapolate based on the laws of physics . Examples of dynamics-based methods include constant velocity and constant acceleration models. Dynamics-based methods are utilised in [7, 8]. The main limitation of these methods is the difficulty of embedding map knowledge, often obtained by observing past trajectories, into the dynamics model. Dynamics models are only able to do very short-term predictions, and do not take into account established paths in the environment. Other attempts at modelling motion trajectories include building dynamic occupancy grid maps based on occupancy data over time [9, 10, 11, 12]. However, such methods are typically memory intensive, and can only make short-term predictions.
Motivated to overcome these limitations, flow-based methods [2, 13, 14, 4, 3, 15] were developed to capture the directional flow in the environment by learning from past observed trajectories, resulting in map-aware  models. These methods rely on extracting the movement direction or velocity from past trajectories and then training a mixture of distributions. Motion trajectories can then be generated by starting at an initial location, recursively sampling the distribution of motion directions, and then take a step in the sampled direction. However, forward sampling directional distributions accumulates errors. To address this issue Kernel Trajectory Maps (KTMs)  were introduced. KTMs modelled entire trajectories as continuous functions, which avoids the shortcomings of forward sampling. However, like earlier flow-based methods [2, 13, 14, 4, 3, 15], KTMs are limited to extrapolating in environments for which the training trajectories were observed. OCTNet extends ideas from KTMs around generating whole trajectories, and generalises map-aware motion prediction to environments where no trajectories have been observed.
Iii-a Problem Formulation
This paper addresses the problem of generating likely motion trajectories in a novel environment, based solely on the occupancy representation of the environment.
We assume that we have a dataset containing occupancy representations of an environment and a list of discrete trajectories observed in the environment. We denote the dataset as , where is the occupancy representation, and is the set of trajectories collected in the corresponding environment, where contains trajectory waypoint coordinates.
Our proposed method learns a generative model that is capable of sampling from the probability distribution over possible trajectories, conditioned on an unseen occupancy representation , i.e.:
where is the generated trajectory, and is the queried map.
Iii-B Trajectory Representations
Trajectories in this paper are either discrete or continuous
. Recorded trajectory data typically takes the form a sequence of discrete waypoints coordinates, whereas our generated trajectories are continuous functions. The continuous function representation allows for querying at arbitrary resolution without additional interpolation.
Discrete trajectories are represented by an arbitray-length sequence of waypoint coordinates, recorded at fixed time steps. We denote a discrete trajectory, , with time steps as, , where are x,y-coordinates of the dynamic object at time .
Continuous trajectories are smooth continuous functions that map from to coordinates. We define a continuous trajectory, , as , where . The time for which the trajectory was recorded is normalised to lie between and .
Continuous trajectories can be discretised by querying at uniform intervals between 0 and 1, and retaining the coordinates in sequential order. It is also possible to estimate continuous trajectories of a discrete trajectory. Details of the employed estimation procedure are given insection III-E.
Iii-C Overview of OCTNet
OCTNet generates likely motion trajectories in new environments, generalising trajectories observed in other environments. Generated trajectories are samples from the distribution over possible trajectories, conditional on a given occupancy. We learn a mapping between encodings of occupancy maps to parameters of the required conditional distribution. Realisations of trajectories can then be sampled from the probability distribution.
The training process is illustrated in fig. 2, and can be summarised as:
Learn using a MDN. Details in section III-F.
A brief overview of the generative process is illustrated in fig. 3. After the model has been trained, we can input a feature vector , associated with the occupancy map of a new environment, to obtain . Vectors of can be sampled from , and each sample can be used to obtain a continuous trajectory . We obtain a distribution over trajectories, and continuous trajectories can be generated by sampling realisations of trajectories from the distribution. As there are no explicit constraints to prevent trajectories from overlapping with occupied regions, we check and only accept valid trajectories to output.
Iii-D Non-parametric Encoding of Environmental Occupancy
We encode the occupancy representation of a given environment as a vector of similarities between the environment, and all environments. We define the similarity function, in a similar fashion to the dissimilarity measures described in . We substitute the Hausdorff distance into a distance substitute kernel 
, to arrive at our similarity function. The Hausdorff distance measures the distance between two finite sets of points, and is commonly used to compare images.
Given two sets of points and , and in general and are not required to be equal, the one-sided Hausdorff distance between the two sets is defined as:
The one-sided Hausdorff distance is not symmetric, we enforce symmetry by taking the average of and , i.e.:
We can then define a similarity function between two sets and , analogous to a distance substitute kernel described in , as:
where is a length scale hyper-parameter. We assume that occupancy representations in the dataset are binarised occupancy grid maps. The map from the dataset can be represented as a set of occupied locations, , where there are occupied coordinates given. A Gram matrix of the similarity function evaluated between each map is obtained. The row of matrix is a feature vector, , for the map, , we can write this as:
This is equivalent to constructing a kernel Gram matrix between each occupancy representation. However, in this work we treat each row of the matrix as a feature vector. For every map in our dataset, there is a corresponding vector of similarities . The process of encoding occupancy information is non-parametric, as for each new data point considered, the length of
will grow. However, as it is difficult to obtain a real-world dataset with a large number of occupancy maps, with associated motion trajectories in the environment, the number of occupancy maps is typically not large. An alternative parametric model would be to consider comparing only against a subset of occupancy representations, rather than comparing against all other occupancy representations. This is similar to the concept of representative trajectories in. The parametric formulation may increase scalability by sacrificing performance.
Iii-E Embedding Discrete Trajectories
We embed the discrete trajectories as a fixed length vector, by considering the best-fit continuous trajectory. The elements of the vector correspond to weights of fixed basis functions which reconstructs a continuous trajectory that best fits the discrete trajectory. The process of finding the best-fit continuous trajectory of a discrete trajectory is explained below.
We define a normalised timestep parameter . Trajectories in the dataset can have arbitrary timesteps, and indicates the proportion of timestep. and refer to the first and last timesteps respectively in the discrete trajectory. A continuous trajectory can be modelled by functions, and , which map from to the x and y coordinates of the trajectory. We model and by weighted sums of fixed radial basis functions centred on evenly spaced values. Suppose we have a discrete trajectory
, the weights that best fit a given discrete trajectory can be found by solving a pair of Kernel Ridge Regression (KRR) problems, defined as:
where is the normalised time parameter, is the ridge regularisation parameter, contains the radial basis function values evaluated at , obtained by:
where is a vector of values to centre the stationary radial basis functions. In this work, we investigate using the squared exponential basis function, as it is smooth and the default in many kernel based methods. Hence, our basis function is defined by
where is the length scale hyper-parameter of the square exponential functions.
is an identity matrix. We denote the concatenation ofand as , where is the number of basis functions. Every discrete trajectory can be converted a corresponding , and typically . We can recover functions that map from an arbitrary queried normalised time parameter to the trajectory coordinates, by and .
Iii-F Learning a Mixture of Stochastic Processes
Each occupancy representation is encoded as a vector of similarities , and the trajectories observed in the map is embedded as a collection of fixed length weight vectors, , where each represents a trajectory. We aim to train a neural network to generate .
Mixture density networks (MDN)  are a class of neural networks capable of representing conditional distributions. We slightly modify the classical MDN described in  to learn a mixture of vectors of conditional distributions, corresponding to the conditional distribution for each element in .
To capture the multi-modality of trajectory distributions, we model the conditional distribution as a mixture of vectors of distributions, which we call components. This can be written as,
where denotes the component, and is the associated component weight. We model each element in the vector with a symmetric distribution. We investigate modelling the distribution over each element in the vector with:
: The normal distribution is the least-informative default distribution, and is also used in the original formulation of MDNs. Under this assumption, we can write each component of the conditional distribution as:
where a single component of the mixture can be parameterised by a vector of means,
, and a vector of standard deviations,.
Laplace distribution: For many problems a mixture of Laplace distributions was found to provide marginally better performance than a mixture of normal distributions . Under this assumption, we can write each component of the conditional distribution as:
where the component can be parameterised by a vector of means, , and a vector of scale parameters,
. The variance and scale parameter are related by.
We can then write the negative log-likelihood loss function overmaps, and trajectories observed in the environment corresponding to the map in the dataset as:
component in the mixture. The standard MDN constraints are applied using the activation functions highlighted in. This includes:
, such that component weights sum up to one, by applying the softmax activation function on associated network outputs;
, standard deviation or scale parameters are non-negative, by applying an exponential activation function on associated network outputs.
Dense (500 units), ReLU activation
|Dense (500 units), ReLU activation|
|Dropout (0.25 rate)|
|Dense (500 units), ReLU activation|
|Dropout (0.25 rate)|
|Dense (500 units), ReLU activation|
|Dropout (0.25 rate)|
|Dense (500 units), ReLU activation|
|Dense ( units)||
Using the neural network with architecture illustrated in table I and minimising the loss function defined in eq. 16, we can learn a model that maps from the feature vector of similarities to the parameters required to construct . As a distribution is estimated for each of the elements in weight vector, , the predicted results in a mixture of discrete processes, where each realisation is a vector of . We can also obtain , which is represented by a mixture of continuous stochastic process, with each realisation being a continuous trajectory. denotes the basis functions outlined in section III-E.
Iii-G Trajectory Generation
We can generate trajectories in environments with no observed trajectories, by generalising past experiences of observed trajectories in alternative environments. We start by inputting , the feature vector of similarities of the map we wish to query, , into the neural network detailed in section III-F. We obtain parameters, that define distributions over each element in . Realisations of can be sampled randomly from the predicted , and a possible continuous trajectory, , can be found by evaluating , where gives the basis function evaluations given in section III-E. As there are no explicit constraints in the MDN to prevent the generation of trajectories which overlap with occupied regions, we apply rejection sampling. We query evenly spaced values and check whether is occupied against the map . If a point on the possible trajectory is occupied, we reject the trajectory, and re-sample from . Otherwise, we accept the possible trajectory. Figure 4 shows 50 generated trajectories along with plots of and . We clearly see that the trajectories generated can belong to different groupings – one group of trajectories starts from inside the room and exit into the corridor, while the other starts in the corridor and end in the room. The hidden ground truth trajectories are under-laid in blue.
Iv Experiments and Results
Iv-a Dataset and Metrics
Training an OCTNet requires a dataset containing occupancy maps of multiple environments along with observed trajectories in each environment. To the best of our knowledge, there exists no real-world dataset of sufficient size with occupancy data and trajectories observed in different environments. Therefore, we conduct our experiments with the simulated Occ-Traj 120 dataset . This dataset contains 120 binary occupancy grid maps of indoor environments with rooms and corridors, as well as simulated motion trajectories. Examples of maps and trajectories are illustrated in fig. 5.
We evaluate the generated trajectories against a test set with hidden ground truth trajectories. Continuous trajectories outputted are discretised for evaluation by querying at uniform intervals. Due to the probabilistic and multi-modal nature of our output, the metric used is minimum trajectory distance (MTD), and is defined by:
where denotes the number of trajectories observed in the environment, with indexing each trajectory, and is a distance measure of trajectory distance between the generated and a ground truth trajectory . In our evaluations the Hausdoff distance, the discrete Frechét distance, and the Dynamic Time Wrapping (DTW) Euclidean distance are considered. These trajectory distances are commonly used in distance-based trajectory clustering to quantify the dissimilarity between trajectories, and a review of these distances can be found in . Intuitively, the MTD measures the distance between the generated , and the most similar ground truth trajectory.
Iv-B Choice of Distribution in Mixture Model
During the training of OCTNet, distributions over every element of vector
are approximated. In each component of the mixture, a class of distributions is taken to be the prior probability distribution. We investigate the performance of assuming normal and Laplace distributions as priors.
OCTNets with normal and Laplace distributions over elements in are trained on 80% of the maps with associated trajectories in the Occ-Traj 120 dataset, with the remaining 20% of maps as test. We select the length scale hyper-parameters and
, and train the networks for 10 epochs.
Performance is shown in table II. As demonstrated by evaluating MTD with all three trajectory distances, assuming Laplace distributions as priors over elements in results in generated trajectories which are relatively more similar to the ground truth trajectories.
The choice of distribution is connected to how trajectories are distributed in the environment. Our results demonstrate that the Laplace distribution assumption leads to stronger results for the Occ-Traj 120 dataset, which contains indoor occupancy maps. However, the most suitable prior may be different in other classes of environment, such as outdoor environments. The OCTNet framework does not limit the choice of prior distributions. Other distributions with closed form probability density functions can be chosen as the prior probability distribution over elements. Our results show the choice of distribution for priors can affect the quality of trajectories, so cross validation methods could be used to guide the choice of prior distributions.
Iv-C Quality of Trajectories Compared to Other Generative Models
To the best of our knowledge, there exists no other generative methods specifically developed to generate trajectories conditional on occupancy maps. Hence, we evaluate the performance of our generative model, OCTNet, against other popular generative models with slight modifications to generate trajectories.
We evaluate the performance of Generative adversarial networks (GAN) 
and Conditional Variational Autoencoders (CVAE), trained to generate vector of weights . This is the same vector of weights, , OCTNets use to represent trajectories. The GAN and CVAE models are trained for 300 epochs.
GAN: GANs are generative models that have achieved success in many generative tasks [25, 26]. We train GANs to generate . Trajectories are obtained by evaluating , similar to OCTNets. As we are conditioning on grid maps and outputting weights, the discriminator uses convolutional layers to encode information for prediction; whereas the generator samples a latent variable with dimensions of 100, concatenated with the map for conditioning. It uses five dense layers to output vector ,;
CVAE: Similar to the GAN model and OCTNet, we train a CVAE  to generate predictions of
. The hyperparameters, such as the dimension of, are chosen to be the same as the GAN model. CVAE differs from a GAN, as it utilises the reparametrisation trick to generate structured output prediction through Gaussian latent variables.
Fig. 6: Examples of 50 trajectory generated from trained models, conditioned on unseen test maps, without discarding any invalid ones. (Top row, in green) OCTNet, (Bottom row, in red) GAN model.
In both GAN and CVAE models, we input binary occupancy maps as images during training. It is often not possible to generate a valid trajectory with GAN or CVAE models used for comparisons in reasonable time, as these trajectories would overlap with occupied regions of the map. In these cases, we generate 3000 trajectories for each map using the GAN or CVAE, and select the trajectory with the minimum overlap with occupied regions, as a proportion of the entire trajectory. Comparatively, OCTNet roughly accepts and outputs one out of every three sampled trajectories as valid. Trajectories generated by the OCTNET and GAN, without rejection sampling, are shown in fig. 6. We see that even without rejection sampling, trajectories generated by OCTNet follow the structure of the environment closely.
The performance results of our experiments are tabulated in table II, we see that OCTNet variants outperform the other generative models compared, demonstrating the high quality of trajectories generated by OCTNet. In particular, the encoding of each occupancy map as a feature vector, , capturing similarity between all maps in the dataset, allows for flexible representations even when the number of maps in the dataset is small. The MDN used can capture the multi-modal behaviour of trajectories, while the off-the-shelf generative models struggle.
We present a novel generative model, OCTNet, capable of producing likely motion trajectories in new environments where no motion has been observed, by generalising from past motion trajectories observed in other environments. The OCTNet encodes maps as a feature vector of similarities, and embeds observed trajectories as fixed-size vectors. A neural network is used to learn conditional distributions over the vectors. Realisations of the vectors can then be sampled from the conditional distribution, and used to reconstruct a generated trajectory from the embedding. We investigate two classes of prior distributions over each element of the embedding vector, and empirically show the strong performance of OCTNet against popular generative methods. Future improvements on OCTNet include incorporating temporal changes in trajectory patterns into the framework. Though challenging, there is also a need to collect real-world dataset of occupancy maps with observed trajectories for future research.
-  A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila, and K. O. Arras, “Human motion trajectory prediction: A survey,” CoRR, 2019.
-  W. Zhi, R. Senanayake, L. Ott, and F. Ramos, “Spatiotemporal learning of directional uncertainty in urban environments with kernel recurrent mixture density networks,” IEEE Robotics and Automation Letters, 2019.
-  T. P. Kucner, M. Magnusson, E. Schaffernicht, V. H. Bennetts, and A. J. Lilienthal, “Enabling flow awareness for mobile robots in partially observable environments,” IEEE Robotics and Automation Letters, 2017.
-  R. Senanayake and F. Ramos, “Directional grid maps: modeling multimodal angular uncertainty in dynamic environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018.
-  W. Zhi, L. Ott, and F. Ramos, “Kernel trajectory maps for multi-modal probabilistic motion prediction,” CoRR, 2019.
-  X. Rong Li and V. P. Jilkov, “Survey of maneuvering target tracking. part i. dynamic models,” IEEE Transactions on Aerospace and Electronic Systems, 2003.
-  S. Zernetsch, S. Kohnen, M. Goldhammer, K. Doll, and B. Sick, “Trajectory prediction of cyclists using a physical model and an artificial neural network,” in 2016 IEEE Intelligent Vehicles Symposium (IV), 2016.
J. F. Kooij, F. Flohr, E. A. Pool, and D. M. Gavrila, “Context-based path
prediction for targets with switching dynamics,”
Int. J. Comput. Vision, 2019.
D. Arbuckle, A. Howard, and M. Mataric, “Temporal occupancy grids: a method for classifying the spatio-temporal properties of the environment,” inIEEE/RSJ International Conference on Intelligent Robots and Systems, 2002.
-  N. C. Mitsou and C. Tzafestas, “Temporal occupancy grid for mobile robot dynamic environment mapping,” 2007.
-  T. Kucner, J. Saarinen, M. Magnusson, and A. J. Lilienthal, “Conditional transition maps: Learning motion patterns in dynamic environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013.
-  G. Tanzmeister, J. Thomas, D. Wollherr, and M. Buss, “Grid-based mapping and tracking in dynamic environments using a uniform evidential environment representation,” in IEEE International Conference on Robotics and Automation, 2014.
-  S. T. O’Callaghan, S. P. N. Singh, A. Alempijevic, and F. T. Ramos, “Learning navigational maps by observing human motion patterns,” IEEE International Conference on Robotics and Automation, 2011.
-  L. McCalman, S. O’Callaghan, and F. Ramos, “Multi-modal estimation with kernel embeddings for learning motion models,” in IEEE International Conference on Robotics and Automation, 2013.
-  S. M. Mellado, G. Cielniak, T. Krajník, and T. Duckett, “Modelling and predicting rhythmic flow patterns in dynamic environments,” in TAROS, 2018.
-  B. Haasdonk and C. Bahlmann, “Learning with distance substitution kernels,” in DAGM-Symposium, Lecture Notes in Computer Science, 2004.
-  E. Pekalska, P. Paclik, and R. P. W. Duin, “A generalized kernel approach to dissimilarity-based classification,” Journal of Machine Learning Research, 2002.
-  D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, “Comparing images using the hausdorff distance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 1993.
-  C. M. Bishop, “Mixture density networks,” tech. rep., Aston University, 1994.
-  A. Brando, “Mixture density networks (mdn) for distribution and uncertainty estimation,” tech. rep., 2017. Master thesis.
-  T. Lai, W. Zhi, and F. Ramos, “Occ-traj120: Occupancy maps with associated trajectories,” CoRR, 2019.
-  P. C. Besse, B. Guillouet, J. Loubes, and F. Royer, “Review and perspective for distance-based clustering of vehicle trajectories,” IEEE Transactions on Intelligent Transportation Systems, 2016.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014.
-  K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” in Advances in neural information processing systems, pp. 3483–3491, 2015.
-  X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Advances in Neural Information Processing Systems, 2016.
-  S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in International Conference on International Conference on Machine Learning, 2016.