Diversity-Aware Vehicle Motion Prediction via Latent Semantic Sampling

11/28/2019 ∙ by Xin Huang, et al. ∙ MIT 2

Vehicle trajectory prediction is crucial for autonomous driving and advanced driver assistant systems. While existing approaches may sample from a predicted distribution of vehicle trajectories, they lack the ability to explore it - a key ability for evaluating safety from a planning and verification perspective. In this work, we devise a novel approach for generating realistic and diverse vehicle trajectories. We extend the generative adversarial network (GAN) framework with a low-dimensional approximate semantic space, and shape that space to capture semantics such as merging and turning. We sample from this space in a way that mimics the predicted distribution, but allows us to control coverage of semantically distinct outcomes. We validate our approach on a publicly available dataset and show results that achieve state of the art prediction performance, while providing improved coverage of the space of predicted trajectory semantics.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Vehicle trajectory prediction is crucial for autonomous driving and advanced driver assistant systems. While existing literature relates to improving the accuracy of prediction [13, 33, 15, 7, 1], the diversity of the predicted trajectories [21, 11]

must be explored. High accuracy implies good approximation of the true distribution according to some performance metric, but emphasizing diversity allows prediction approaches to access low-probability but high-importance parts of the state space. Diverse trajectory sampling provides coverage of possible actions for surrounding vehicles and facilitates safe motion planning and accurate behavior modeling for nearby vehicles in simulation. For instance, at an intersection, sampling distinct outcomes, such as left or right turns, rather than simply predicting going forward, provides benefits in verification. Different maneuvers can have radically different outcomes, and missing one of them can be catastrophic. Sampling efficiently proves difficult in such scenarios, as neither the distribution of trajectories nor the definition of semantically distinct outcomes has an analytical form. Additionally, expensive roll-outs of a future trajectory are required to define its utility, which considers environment of the car and nearby agents.

Direct sampling

Latent semantic sampling

Fig. 1: Top to bottom: direct vs. latent semantic sampling. In latent semantic sampling, representative samples are taken in the latent space, with weights associated from the distribution. In this way, a few samples can capture relevant semantic aspects, while ensuring consistency with the true prediction distribution.

In this paper, we propose a model that handles both accuracy and diversity by incorporating a latent semantic layer into the trajectory generation step. This layer should represent approximate high-level vehicle behaviors, matching semantic distinctions when they exist. We expect it to be effectively low-dimensional, since a driver can perform only a few distinct maneuvers at any given moment. Therefore, enumerating low dimensional samples should be feasible; however, we wish to do so without matching the driver’s behaviors into a fixed taxonomy. We illustrate this idea in Figure 

1, where the goal is to produce diverse trajectory predictions and cover distinct outcomes. The top row shows traditional sampling, which fail to sample diverse behaviors efficiently. The bottom row demonstrates our latent semantic sampling technique, which is able to capture both maneuvers in the intersection.

We do so by shaping the notion of similarity in the intermediate layer activation via metric learning [32]. We train the latent semantic layer activations to match annotations of high-level labels where these exist. Distances between two trajectories should be large if they represent different semantic labels, and should be small otherwise.

In addition to prediction, our model can produce behavior samples for simulation and verification. Verification of safety properties for a given driving strategy is challenging, since it requires numerous simulations using predictive models instantiated over a large sampling space of initial agent conditions, road configurations, etc. A semantically-meaningful, low-dimensional latent space provides the advantage of efficient sampling of all possible behaviors, requiring fewer simulations to find rare events that affect safety (e.g. collisions between cars).

Finally, our proposed latent state affords some interpretation of the network, which is crucial in safety-critical tasks such as autonomous driving. By tuning the high-level latent state, our samples better cover the human intuition about diverse outcomes.

Our work has three main contributions. i) We extend a generative adversarial network to produce diverse and realistic future vehicle trajectories. We process the noise samples into two independent latent vectors, utilizing loss functions to disentangle them. The high-level vector captures semantic properties of trajectories, while the low-level layer maintains spatial and social context. ii) We describe an efficient sampling method to cover the possible future actions, which is important for safe motion planning and realistic behavior modeling in simulation. iii) We validate our approach on a publicly available dataset with vehicle trajectories collected in urban driving. Quantitative results show our method outperforming state-of-the-art approaches, while in qualitative scenarios it efficiently generates diversified trajectories.

The remainder of the paper is organized as follows. We introduce relevant work in Section I-A, and our problem formulation and proposed method in Section II. We demonstrate results in vehicle motion prediction in Section III, followed by a summary and a discussion of future work in Section IV.

I-a Related Works

Our work relates to several topics in probabilistic trajectory prediction. Unlike deterministic alternatives [13], it allows us reason about the uncertainty of driver’s behaviors. There are several representations that underlie reasoning about trajectories. [33, 15, 16, 28, 4]

predict future vehicle trajectories as Gaussian mixture models, whereas

[18] utilizes a grid-based map. In our work, we focus on generating trajectory samples directly from an approximated distribution space, using a sequential network, similar to [21, 11].

For longer term prediction horizons, additional context cues are needed from the driving environment. Spatial context, including as mapped lanes, not only indicates the possible options a vehicle may take (especially at intersections), but also improves the prediction accuracy, as vehicles usually follow lane centers closely [7, 5]. Another important cue is social context based on nearby agents, affording reasoning about interaction among agents [1, 11, 23, 16]. Our method takes advantage of these two cues by feeding map data and nearby agent positions into our model, improving the accuracy of predictions over a few seconds.

Recently proposed generative adversarial networks (GANs) can sample trajectories by utilizing a generator of vehicle trajectories and a discriminator that distinguishes real trajectories and trajectories produced by the generator [10, 11, 23, 22]. Despite their success, efficiently producing unlikely events, such as lane changes and turns, remains a challenge. These events are important to consider as they can pose a significant risk and affect driving decisions.

Fig. 2: Architecture diagram of prediction model. We shape the space of the intermediate vector to resemble a human’s concept of distances and then use it to modify the samples that are fed to the decoder.

Hybrid maneuver-based models [8]

are effective in producing distinct vehicle behaviors. They first classify maneuvers based on vehicle trajectories, and then predict future positions conditioned on a maneuver. As such, they are restricted to cases where pre-defined maneuvers are well defined. Similar to

[28], our method allows more general cases dealing with undefined semantics, including multi-vehicle interactions.

Beyond prediction, recent learning models use an intermediate representation in probabilistic network models to improve sample efficiency and coverage. [28] utilizes a set of discrete latent variables to represent different driver intentions and behaviors. [27] has shown that there exist semantics in the latent space of generative adverserial networks (GANs), and [6] successfully decomposes the latent factor in a GAN into structured semantic parts. In addition to GANs, [14]

has learned disentangled latent representations in a variational autoencoder (VAE) framework to ground spatial relations between objects. Unlike the information bottleneck motivation of

[6], we use metric learning [32] to capture information such as maneuvers and interactions. The low dimensionality of the semantics space allows us to obtain distinct vehicle behaviors efficiently. In a relevant work, [30]

proposes to generate samples in a potential field learned by the discriminator to approximate the real probability distribution of data accurately, and to ensure sample diversity.

Finally, our work has applications to sampling and estimation of rare events for verification, which is its own active field, see

[26, 3, 19, 20, 24] and references therein. The closest work to ours is [24, 19], which also propose sample-based estimation of probabilities. As opposed to probability estimation under standard driving, our work focuses explicitly on sampling from diverse modes of behaviors.

Ii Model

Here, we present the problem formulation and describe the model underlying our work, including loss functions and our proposed sampling procedure.

Ii-a Problem Formulation

The input to the trajectory prediction problem includes a sequence of observed vehicle trajectories , as well as the surrounding lanes, given as their centerline coordinates, denoted as . The goal is to predict a set of possible future trajectories , where the acausal future trajectories are denoted as .

In the probabilistic setting, since multiple future trajectory sets are possible, the goal is to estimate the predicted probability distribution . Many of the modern approaches sample from in the lack of a closed-form expression for it, requiring some form of sample generation approaches, such as traditional ones such as MCMC and particle filters [17], planning based approaches such as RRTs[2], and GANs and other probabilistic generative networks[21, 1].

Ii-B Model Overview

We now describe the network structure and sampling approach, as illustrated in Figure 2. The trajectory generator takes the past trajectory of target vehicles, a map of lane centerlines and a noise sample, before producing samples of future trajectories. The discriminator identifies whether the generated trajectory is realistic.

In addition to the generator and discriminator networks, we require a source of semantic labels about trajectories. These labels can include maneuvers such as merging, turning or slowing down, or interaction patterns such as giving right of way or turning at a four-way-stop junction. For simplicity, these labels may be boolean or unknown values, and they are arranged into a vector with elements , where denotes that is unknown or undefined. We stress that for some values of , in some instances the any choice does not make sense. For example, a labels of ”the vehicle is next on a stop sign intersection” and ”is vehicle waiting on a red line or not” do not co-exist. This motivates a representation that avoids a single taxonomy of all road situations with definite semantic values.

Ii-C Trajectory Generator

The trajectory generator predicts realistic future vehicle trajectories given inputs of the past trajectories and the map information. It embeds the two inputs before sending them into a long short-term memory (LSTM) network encoder that captures both the spatial and temporal aspect from the inputs. The encoder output is combined with a noise vector generated from a standard normal distribution, and fed into a latent network that separates the information into a high-level vector and a low-level vector. The decoder, taking these two vectors, produces the trajectory samples.

Ii-C1 Trajectory Network

A series of fully connected layers that embed spatial coordinates into a trajectory embedding vector [1].

Ii-C2 Map Network

In order to simplify the task of learning to interact with the map, we using the following representation for the lanes. First, we find the nearest point to the vehicle from each lane at the predicting time. Second, we traverse each lane starting at its nearest point to generate an arclength-parameterized curve before computing polynomial coefficients up to second order. Third, we create monomials for the coefficients of the target vehicle using the vehicle velocity and 1,2 sampling time steps – and for . Last, we feed the products to allow the encoder and discriminator to learn lane behavior.

Ii-C3 Encoder

A series of LSTM units process the spatial and map embedding vectors from time steps to . The output is a hidden vector that stores the relevant information up to the current time step.

Ii-C4 Latent Network

A series of fully connected layers takes the encoder’s hidden vector and a noise sample from a standard normal distribution. The outputs are two activation vectors: a vector that represents high level information such as maneuvers, and a vector that represents low level information such as vehicle dynamics. To sample efficiently from at test time, is designed to be much smaller than . We train the vectors to be uncorrelated, with matching semantic labels in terms of distances between samples This representation disentangles semantic concepts from low-level trajectory information, in a fashion resembling information bottlenecks [6], but driven by human notions of semantic similarity as learned from the labels.

Ii-C5 RNN-based decoder

A series of LSTM units takes , , and a map embedding vector, to output a sequence of future vehicle positions.

Ii-D Trajectory Discriminator

An LSTM-based encoder converts the past trajectory and future predictions into a label = {fake, real}, where fake means a trajectory is generated by our predictor, while real means the trajectory is from data. The structure of the discriminator mirrors that of the trajectory encoder, except in its output dimensionality.

Ii-E Losses

Similar to [11], we measure the performance of our model using the average displacement error (ADE) of Equation 1 and the final displacement error (FDE) of Equation 2.

(1)
(2)

Ii-E1 Best prediction displacement loss

Also as in [11], we compute the Minimum over N (MoN) losses to encourage the model to cover groundtruth options while maintaining diversity in its predictions:

(3)

where are samples generated by our model. The loss, over

samples from the generator, is computed as the average distance between the best predicted trajectories and acausal trajectories. Although minimizing MoN loss leads to a diluted probability density function compared to the groundtruth

[29], we use it to show that our method can estimate an approximate distribution efficiently. We defer a different, more accurate, supervisory cue to future work.

Ii-E2 Adversarial loss

We use standard binary cross entropy losses, , to compute the loss between outputs from the discriminator and the labels. This loss is used to encourage diversity in predictions and is assigned with a higher weight once best prediction displacement loss is reduced to a reasonable scale.

Ii-E3 Independence loss

The independence loss enforces that the cross-covariance between the two latent vectors and remain small, encouraging to hold only low-level information. While this does not guarantee independence of the two, we found this to suffice as regularization.

(4)

Ii-E4 Latent space regularization loss

The latent loss regularizes and

in terms of their mean and variance and helps to avoid degenerate solutions.

(5)

where denotes the Frobenius norm.

Ii-E5 Embedding loss

After enforcing and are independent vectors, we introduce an embedding loss to enforce the correlation between high-level latent vector and prediction coding . Similar to [25], if two data samples have the same answer element for label , we expect the differences in their high-level latent vectors to be small. On the other hand, if two predictions have different codings, we want to encourage the difference to be large. This can be written as

(6)

where is batch size, denote the label answers on examples respectively, and if either argument is .

Ii-E6 Total loss

In total, we combine the losses listed above together with appropriate coefficients that are adjusted dynamically during training.

(7)
(8)

Ii-F Sampling Approach

We now describe how we sample from the space of in Alg. 1. We generate a set of latent samples, selecting from them a subset of representatives using the Farthest Point Sampling (FPS) algorithm[9, 12]. We store the nearest representative identity as we compute the distances, to augment the FPS representatives with a weight proportional to their Voronoi cell. This gives us a weighted set of samples that converges to the original distribution, but favors samples from distinct regions of space. FPS allows us to emphasize samples that represent distinct high level maneuvers encoded in .

1:for all  do
2:     Sample from .
3:     Generate latent sample .
4:end for
5:Perform Farthest Point Sampling on to obtain representative samples, , where denotes a sample of .
6:Compute Voronoi weights for each sample based on the samples.
7:Decode from a full prediction , store along with weights .
8:Return
Algorithm 1 Semantic Sampling

The samples cover (in the sense of an -covering) the space of possible high-level choices. The high level latent space is shaped according to human labels of similarity. With this similarity metric shaping, FPS techniques can leverage its -optimal distance coverage property in order to capture the majority of semantically different roll-outs in just a few samples.111We note that a modified FPS[31] can trade off mode-seeking with coverage-seeking when generating samples.

Iii Results

In this section, we describe the details of our model and dataset, followed by a set of quantitative results against state-of-the-art baselines and qualitative results on diverse prediction.

Iii-a Model Details

The Trajectory Network utilizes two stacked linear layers with dimensions of (32, 32). The Map Network uses four stacked linear layers with dimensions of (64, 32, 16, 32). An LSTM with one layer and a hidden dimension of 64 forms both the Encoder and Decoder in the Trajectory Generator. The Latent Network takes inputs from the Encoder and a noise vector with dimension of 10. This network is composed of two individual linear layers with output dimensions of 3 and 71 for the high-level and low-level layers, respectively. The Discriminator

is an LSTM with the same structure as the Generator’s Encoder, followed by a series of stacked linear layers with dimensions of (64, 16, 1), activated by a sigmoid layer at the end. All linear layers in the Generator are followed by a batch norm, ReLU, and dropout layers. The linear layers in the Discriminator utilize a leakyReLU activation instead. The number of samples

we use for the MoN loss is 5.

The model is implemented in Pytorch and trained on a single NVIDIA Tesla V100 GPU. We use the Argoverse forecasting dataset

[5] for training and validation, and select the trained model with the smallest MoN ADE loss on validation set.

Iii-B Semantic Annotations

In order to test our embedding over a large scale dataset, we devised a set of classifiers for the data as surrogates to human annotations. They check for specific high-level trajectory features, and each of them outputs a ternary bit representing whether the feature exists, does not exist, or is unknown, as a -dimensional vector that includes the outputs from all filters. The list of feature filters used in this paper includes: accelerate, decelerate, turn left, turn right, lane follow, lane change, move to left latitudinally, and move to right latitudinally.

Iii-C Quantitative Results

Iii-C1 Prediction

Over 1 and 3 second prediction horizons, with samples, we compute the MoN ADE (1) and FDE (2

) losses, respectively. In addition to our method, we introduce a few baseline models to demonstrate the prediction accuracy of our method. The first two baselines include a linear Kalman filter with a constant velocity (CV) model and with a constant acceleration (CA) model, respectively. We sample multiple trajectories given the smoothing uncertainties. The third baseline is an LSTM-based encoder decoder model

[5], which produces deterministic predictions. In addition, we introduce a few variants of a vanilla GAN-based model taking different input features, where social contains the positions of nearby agents and map contains the nearby lane information as described in II-C2. The results are summarized in Table I. The first two rows indicate that physics-based models can produce predictions with reasonable accuracy. Using only five samples, the CV Kalman Filter outperforms a deterministic deep model with results shown on the third row. The rest of the table shows that a generative adversarial network improve upon accuracy by a large margin compared to physics-based models using five samples. It is observed that the map features contribute more to long horizon predictions. Additionally, our method is competitive compared to standard ones, after regularizing the latent space, while adding sample diversification.

1 Second 3 Seconds
Model Name ADE FDE ADE FDE
Kalman Filter (CV) 0.51 0.79 1.63 3.62
Kalman Filter (CA) 0.69 1.22 2.87 7.08
LSTM Encoder Decoder 0.57 0.94 1.81 4.13
GAN 0.42 0.62 1.55 3.09
GAN+social 0.44 0.66 1.68 3.04
GAN+social+map 0.44 0.63 1.34 2.75
DiversityGAN+social+map 0.41 0.65 1.35 2.74
DiversityGAN(FPS)+social+map 0.44 0.62 1.33 2.72
TABLE I: MoN average displacement errors (ADE) and final displacement errors (FDE) of our method and baseline models with samples.

To show the effectiveness of our latent sampling approach, we measure the MoN loss with and without the FPS method. We test using a challenging subset of the validation dataset that filters out straight driving with constant velocity scenarios, resulting in a trajectory distribution that emphasizes rare events in the data. As indicated in Figure 3, when the number of samples increases, the prediction loss using FPS drops faster compared to direct sampling. We note the improvement is larger in the regime of - samples, where reasoning about full roll-out of multiple hypotheses is still practical in real-time systems, and we obtain an improvement of . However, beyond the gain in average accuracy, the importance of the method is that it is able to obtain some samples from the additional modes of the distribution of trajectories. We demonstrate the advantage of our methods with a small number of samples in Section III-D.

Fig. 3: MoN ADE loss of FPS sampling (blue) and direct sampling (orange) over 3 seconds with from 1 to 8. The gap between two curves indicates the improvement using FPS, especially when

is from 2 to 6. Error bars represent one standard deviation from five runs with different random seeds.

Iii-D Qualitative Results

Fig. 4: Illustrations of how our approach captures rare events by selecting samples that are farther away. The left column highlights the samples selected by FPS and their associated predictions, where . The right column highlights the selected samples using direct sampling and their associated predictions, which cover only high likely events. Blue: observed and acausal trajectories. Red: predicted trajectory samples. Black: lane centers.

We first show how FPS can be used to improve both prediction accuracy and diversity coverage by illustrating two examples in Figure 4.

FPS Direct Sampling (a) Predicting diversified events helps reduce prediction error in challenging scenarios. (b) Predicting merging and turning events enables robust and safe decision making for the ego car.
Fig. 5: Predictions of rare events in complicated driving scenarios help improve both accuracy (a) and diversity (b). Top to bottom: FPS and direct sampling with trajectory samples.

In the first example as illustrated in Figure 4(a), our method, as described in Algorithm 1, first generate samples in grey, and select samples using FPS (highlighted on the left column) and direct sampling (highlighted on the right column) to produce predictions. By selecting samples that are farther away, FPS is able to produce rare events such as right turn, as labelled in 2, that match with the acausal trajectory and thus improve the prediction accuracy. On the other hand, direct sampling tends to sample points from denser regions, which lead to high likelihood events. We show two additional challenging examples in Figure 5(a), where FPS is able to reduce the prediction error by covering turning events when the vehicle is approaching an off-ramp and a full intersection, respectively.

In the second example as illustrated in Figure 4(b), although our method predicts rare events that do not improve displacement losses compared to direct sampling, they are still important for decision making and risk estimation. Although the target vehicle is most likely to go forward, it is useful for our predictor to cover lane change behavior, as labelled in 1, even with a low likelihood, since such prediction could help avoid a possible collision if our ego car is driving on the right lane. Similarly, in the other two examples as shown in Figure 5(b), our method produces events such as merging and turning that are unlikely to happen but are important to consider for robust and safe decision making for the ego car.

Iv Conclusion

We propose a vehicle motion prediction method that caters to both prediction accuracy and diversity. We achieve this by dividing a latent variable into a learned semantic-level part encoding discrete options that the target vehicle can possibly take, and a low-level part encoding other information. The method is demonstrated to achieve state-of-the-art prediction accuracy, while efficiently obtaining trajectory coverage by near-optimal sampling of the high-level latent vector. Future work includes adding more complicated semantic labels such as vehicle interactions, and exploring other sampling methods beyond FPS.

References

  • [1] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese (2016) Social LSTM: human trajectory prediction in crowded spaces. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 961–971. Cited by: §I-A, §I, §II-A, §II-C1.
  • [2] G. Aoude, J. Joseph, N. Roy, and J. How (2011) Mobile agent trajectory prediction using bayesian nonparametric reachability trees. In Infotech@ Aerospace 2011, pp. 1512. Cited by: §II-A.
  • [3] J. Bucklew (2013) Introduction to rare event simulation. Springer Science & Business Media. Cited by: §I-A.
  • [4] Y. Chai, B. Sapp, M. Bansal, and D. Anguelov (2019) Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449. Cited by: §I-A.
  • [5] M. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, et al. (2019) Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8748–8757. Cited by: §I-A, §III-A, §III-C1.
  • [6] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2172–2180. Cited by: §I-A, §II-C4.
  • [7] H. Cui, V. Radosavljevic, F. Chou, T. Lin, T. Nguyen, T. Huang, J. Schneider, and N. Djuric (2019) Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In 2019 IEEE International Conference on Robotics and Automation (ICRA), pp. 2090–2096. Cited by: §I-A, §I.
  • [8] N. Deo and M. M. Trivedi (2018) Multi-modal trajectory prediction of surrounding vehicles with maneuver based LSTMs. In 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1179–1184. Cited by: §I-A.
  • [9] T. F. Gonzalez (1985) Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, pp. 293 – 306. External Links: ISSN 0304-3975 Cited by: §II-F.
  • [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680. Cited by: §I-A.
  • [11] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi (2018) Social GAN: socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2255–2264. Cited by: §I-A, §I-A, §I-A, §I, §II-E1, §II-E.
  • [12] D. S. Hochbaum and D. B. Shmoys (1985-05)

    A best possible heuristic for the k-center problem

    .
    Math. Oper. Res. 10 (2), pp. 180–184. External Links: ISSN 0364-765X, Document Cited by: §II-F.
  • [13] A. Houenou, P. Bonnifait, V. Cherfaoui, and W. Yao (2013) Vehicle trajectory prediction based on motion model and maneuver recognition. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4363–4369. Cited by: §I-A, §I.
  • [14] Y. Hristov, D. Angelov, M. Burke, A. Lascarides, and S. Ramamoorthy (2019) Disentangled relational representations for explaining and learning from demonstration. arXiv preprint arXiv:1907.13627. Cited by: §I-A.
  • [15] X. Huang, S. McGill, B. C. Williams, L. Fletcher, and G. Rosman (2019) Uncertainty-aware driver trajectory prediction at urban intersections. In 2019 IEEE International Conference on Robotics and Automation (ICRA), pp. 9718–9724. Cited by: §I-A, §I.
  • [16] B. Ivanovic and M. Pavone (2019) The Trajectron: probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Cited by: §I-A, §I-A.
  • [17] V. Karasev, A. Ayvaci, B. Heisele, and S. Soatto (2016) Intent-aware long-term prediction of pedestrian motion. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2543–2549. Cited by: §II-A.
  • [18] B. Kim, C. M. Kang, J. Kim, S. H. Lee, C. C. Chung, and J. W. Choi (2017)

    Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network

    .
    In 2017 IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 399–404. Cited by: §I-A.
  • [19] M. Koren and M. Kochenderfer (2019) Efficient autonomy validation in simulation with adaptive stress testing. arXiv preprint arXiv:1907.06795. Cited by: §I-A.
  • [20] M. Koschi, C. Pek, S. Maierhofer, and M. Althoff (2019) Computationally efficient safety falsification of adaptive cruise control systems. In 2019 IEEE International Conference on Intelligent Transportation Systems (ITSC), Cited by: §I-A.
  • [21] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. Torr, and M. Chandraker (2017) Desire: distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 336–345. Cited by: §I-A, §I, §II-A.
  • [22] J. Li, H. Ma, and M. Tomizuka (2019) Conditional generative neural system for probabilistic trajectory prediction. arXiv preprint arXiv:1905.01631. Cited by: §I-A.
  • [23] J. Li, H. Ma, and M. Tomizuka (2019) Interaction-aware multi-agent tracking and probabilistic behavior prediction via adversarial learning. In 2019 IEEE International Conference on Robotics and Automation (ICRA), pp. 6658–6664. Cited by: §I-A, §I-A.
  • [24] M. O’Kelly, A. Sinha, H. Namkoong, J. Duchi, and R. Tedrake (2019) A scalable risk-based framework for rigorous autonomous vehicle evaluation. Cited by: §I-A.
  • [25] G. Rosman, L. Paull, and D. Rus (2017) Hybrid control and learning with coresets for autonomous vehicles. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6894–6901. Cited by: §II-E5.
  • [26] R. Y. Rubinstein (2001) Combinatorial optimization, cross-entropy, ants and rare events. In Stochastic Optimization: Algorithms and Applications, S. Uryasev and P. M. Pardalos (Eds.), pp. 303–363. Cited by: §I-A.
  • [27] Y. Shen, J. Gu, X. Tang, and B. Zhou (2019) Interpreting the latent space of GANs for semantic face editing. arXiv preprint arXiv:1907.10786. Cited by: §I-A.
  • [28] Y. C. Tang and R. Salakhutdinov (2019) Multiple futures prediction. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §I-A, §I-A, §I-A.
  • [29] L. A. Thiede and P. P. Brahma (2019) Analyzing the variety loss in the context of probabilistic trajectory prediction. arXiv preprint arXiv:1907.10178. Cited by: §II-E1.
  • [30] T. Unterthiner, B. Nessler, C. Seward, G. Klambauer, M. Heusel, H. Ramsauer, and S. Hochreiter (2018) Coulomb GANs: provably optimal nash equilibria via potential fields. In 2018 International Conference on Learning Representations (ICLR), Cited by: §I-A.
  • [31] M. Volkov, G. Rosman, D. Feldman, J. W. Fisher, and D. Rus (2015) Coresets for visual summarization with applications to loop closure. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 3638–3645. Cited by: footnote 1.
  • [32] K. Q. Weinberger and L. K. Saul (2009) Distance metric learning for large margin nearest neighbor classification.

    Journal of Machine Learning Research

    10 (Feb), pp. 207–244.
    Cited by: §I-A, §I.
  • [33] J. Wiest, M. Höffken, U. Kreßel, and K. Dietmayer (2012) Probabilistic trajectory prediction with gaussian mixture models. In 2012 IEEE Intelligent Vehicles Symposium (IV), pp. 141–146. Cited by: §I-A, §I.