1 Introduction
Understanding and predicting pedestrian movement behaviors is crucial for autonomous systems to safely navigate interactive environments. By correctly forecasting pedestrian trajectories, a robot can plan safe and sociallyaware paths in traffic [1, 22, 32, 21] and produce alarms about anomalous motions (e.g., crashes or near collisions) [24, 40, 38, 36, 37]. Early work in pedestrian trajectory prediction often assumed a deterministic future, where only one trajectory is predicted for each person given past observations [16, 12, 35]. However, pedestrians move with a high degree of stochasticity so multiple plausible and distinct future behaviors can exist [11, 10]. Recent studies [15, 20, 2, 13, 31] have shown predicting a distribution of multiple potential future trajectories (i.e., multimodal prediction) rather than a single best trajectory can more accurately model future motions of pedestrians.
Recurrent neural networks (RNNs), notably long shortterm memory networks (LSTMs) and gated recurrent units (GRUs), have demonstrated success in trajectory prediction
[22, 9, 39, 26]. However, existing models recurrently predict future trajectories based on previous output thus their performance tends to deteriorate rapidly over time ( 560 ms) [10, 5]. We propose to address this problem with a novel goalconditioned bidirectional trajectory predictor, named BiTraP. BiTraP first estimates future goals (endpoints of the future trajectories) of pedestrians and then predicts trajectories by combining forward passing from current position and backward passing from estimated goals. We believe that predicting goals can improve longterm trajectory predictions, as pedestrians in real world often have desired goals and plan paths to reach these goals [23]. Compared to existing goalconditioned methods [23, 27, 29] where goals were used as an input to a forward decoder, BiTraP takes goals as the starting position of a backward decoder and predicts future trajectories from two directions, thus mitigating the accumulated error over longer prediction horizons.Recently, generative models such as the generative adversarial network (GAN)
[11] and conditional variational autoencoder (CVAE) [33, 20], were developed to predict multimodal distributions of future trajectories. Our BiTraP model predicts multimodal trajectories based on CVAE which learns target future trajectory distributions conditioned on the observed past trajectories through a stochastic latent variable. The two most common forms of the latent variable follow either a Gaussian distribution or a categorical distribution, resulting in either a nonparametric target distribution
[20, 23]or a parametric target distribution model such as a Gaussian Mixture Model (GMM)
[13, 31]. There has been limited research on how latent variable distributions impact predicted multimodal trajectories. To fill this gap, we conducted extensive comparison studies using two variations of our BiTraP method: a nonparametric model using Gaussian latent variables (BiTraPNP) and a GMM model using categorical latent variables (BiTraPGMM). We implemented two types of loss functions, bestofmany (BoM) L2 loss
[4] and negative loglikelihood (NLL) loss [31] to evaluate different predicted trajectory behaviors (e.g., spread and diversity). We show that latent variable distribution choices are closely related to the diversity of predicted distributions, which provides guidance for selecting trajectory predictors for robot navigation and collision avoidance systems.The contributions of this work are summarized as follows. First, we developed a novel bidirectional trajectory predictor (BiTraP) based on multimodal goal estimation and show it offers significant improvements on trajectory prediction performance especially for longer ( seconds) prediction horizons. Second, we studied parametric versus nonparametric target modeling methods by presenting two variations of our model, BiTraPNP and BiTraPGMM, and compare their influence on the diversity of predicted distribution. Extensive experiments with both first person and bird’s eye view datasets show the effectiveness of BiTraP models in different domains.
2 Related Work
Our BiTraP model consists of two parts: a multimodal goal estimator and a goalconditioned bidirectional trajectory predictor. This section describes related work in multimodal trajectory prediction and goalconditioned prediction.
CVAEbased Approaches for Multimodal Trajectory Prediction. Probabilistic approaches, particularly conditional variational autoencoder (CVAE) based models, have been developed for multimodal trajectory prediction. Different from GANs [11, 17], CVAEs can explicitly learn the form of a target distribution conditioned on past observations by learning the latent distribution from which it samples. Some CVAE methods assume the target trajectory follows a nonparametric (NP) distribution and produces multimodal predictions by sampling from a Gaussian latent space. Lee et al. [20] first used CVAE for multimodal trajectory prediction by incorporating Gaussian latent space sampling to an long shortterm memory encoderdecoder (LSTMED) model. CVAE with LSTM components has since been used in many applications [7, 14, 6]. Other CVAEbased methods assume parametric trajectory distributions. Ivanovic et al.[13] assumed the target trajectory follows a Gaussian Mixture Model (GMM) and designed a Trajectron network to predict GMM parameters using a spatiotemporal graph. Trajectron++ [31] extended Trajectron to account for dynamics and heterogeneous input data. Our work extends existing CVAE models to include goal estimation and shows improved multimodal prediction results. Our work also provides novel insights in comparisons between CVAE target distributions (NP and GMM).
Trajectory Conditioned on Goals. Incorporating goals has been shown to improve trajectory prediction. Rehder et al. [27] proposed a particlefilter based method to estimate goal distribution as a prior for trajectory prediction. We drew inspiration from [28]
, which computed forward and backward rewards based on current position and goal; the path is planned using Inverse Reinforcement Learning (IRL). Our work is distinct due to its bidirectional temporal propagation and integration combined with a CVAE to achieve multimodal prediction. Rhinehart
et al. [29] estimated multimodal semantic action as goals and planned conditioned trajectories using imitative models. Deo et al. [8] used IRL to estimate goal states and fused results with past trajectory encodings to generate predictions. Most recently, Mangalam et al. [23] designed a PECNet which showed stateoftheart results on BEV trajectory prediction datasets. However, PECNet only concatenated past trajectory encodings and endpoint encodings, which we believe did not fully take advantage of goal information. We have designed a bidirectional trajectory decoder in which current trajectory information is passed forward to the endpoints (goals) and goals are recurrently propagated back to the current position. Experiment results show that our goal estimation can help generate more accurate trajectories.3 BiTraP: Bidirectional Trajectory Prediction with Goal Estimation
Our BiTraP model performs goalconditioned multimodal bidirectional trajectory prediction in either firstperson view (FPV) or bird’s eye view (BEV). Let denote observed past trajectory at time , where is bounding box location and size in pixels for FPV [39, 26] and position in meters for BEV [31]. Given , we first estimate goal of the person then predict future trajectory , where and are observation and prediction horizons, respectively. Define goal as the future trajectory endpoint, which is given in training and unknown in testing. We adopt a CVAE model to realize multimodal goal and trajectory prediction. BiTraP contains four submodules: conditional prior network to model latent variable from observations, recognition network to capture dependencies between and , goal generation network , and trajectory generation network where , , and represent network parameters. Either parametric or nonparametric models can be used to design networks and for CVAE. Nonparametric models do not assume the distribution format of target but learn it implicitly by learning the distribution of . Parametric models assume a known distribution format for and predict distribution parameters. We design nonparametric and parametric models in Sections 3.1 and 3.2, respectively, and explain different loss functions to train these models in Sections 3.3 and 3.4.
3.1 BiTraP with Nonparametric (NP) Distribution
BiTraPNP is built on a standard recurrent neural network encoderdecoder (RNNED) based CVAE trajectory predictor as in [20, 23, 4, 14], except it predicts goal first and then predict trajectories leveraging goals. Following previous work, we assume Gaussian latent variable and a nonparametric target distribution format. Fig. 1 shows the network architecture of BiTraPNP.
Encoder and goal estimation. First, observed trajectory
is processed by a gatedrecurrent unit (GRU) encoder network to obtain encoded feature vector
. In training, ground truth target is encoded by another GRU yielding . Recognition network takes and to predict distribution mean and covariance which capture dependencies between observation and ground truth target. Prior network assumes no knowledge about target and predicts and usingonly. Kullback–Leibler divergence (
) loss between and is optimized so that dependency between and is implicitly learned by the prior network. Latent variable is sampled from and concatenated with to predict multimodal goals with goal generation network . In testing, we directly draw multiple samples from and concatenate to predict estimated goals. We use 3layer multilayer perceptrons (MLPs) for
prior, recognition and goal generation networks.Trajectory Decoder. Predicted goals are used as inputs to a bidirectional trajectory generation network , the trajectory decoder, to predict multimodal trajectories. BiTraP’s decoder contains forward and backward RNNs. The forward RNN is similar to a regular RNN decoder (Eq. (1)) except its output is not transformed to trajectory space. The backward RNN is initialized from encoder hidden state . It takes estimated goal as the initial input (Eq. (2)) and propagates from time to so backward hidden state is updated from the goal to the current location. Forward and backward hidden states for the same time step are concatenated to predict the final trajectory waypoint at that time (Eq. (3)). These steps can be formulated as
(1)  
(2)  
(3) 
where, , , and indicate “forward”, “backward”, “input” and “output” respectively, and and are initialized by passing through two different fullyconnected networks.
3.2 BiTraP with GMM Distribution
Parametric models predict trajectory distribution parameters instead of trajectory coordinates. BiTraPGMM is our parametric variation of BiTraP assuming a GMM for the trajectory goal and at each waypoint [13, 31]. Let denote a component GMM at time step . We assume , where each Gaussian component can be considered the distribution of one trajectory modality. Mixture component weights sum to one thus form a categorical distribution. Each
indicates the probability (confidence) that a person’s motion belongs to that modality. We design latent vector
as a categorical () variable parameterized by GMM component weights rather than separatelycomputed parameters. Similar to BiTraPNP, we use three 3layer MLPs for the prior, recognition and goal generation networks, and a bidirectional RNN decoder for the trajectory generation network. Instead of directly predicting trajectory coordinates, generation networks of BiTraPGMM estimate the and of the th Gaussian components at time . In training, we sample one from each category to ensure all trajectory modalities are trained. In testing, we sample from so it is more probable to sample from highconfidence trajectory modalities.3.3 Residual Prediction and BoM Loss for BiTraPNP
Instead of directly predicting future location [26] or integrating from predicted future velocity [31], BiTraPNP predicts change with respect to the current location based on residuals . There are two advantages of residual prediction. First, it assures the model will predict the trajectory starting from the current location, providing smaller initial loss than predicting location from scratch. Second, the residual target can be less noisy than the velocity target due to the fact that trajectory annotation is not always accurate. Standard CVAE loss includes NLL loss of the predicted distribution which is not applicable to NP methods due to their unknown distribution format. L2 loss between predictions and targets can be used as a substitution [20]. To further encourage diversity in multimodal prediction, we use bestofmany (BoM) L2 loss as in [4]. The final loss function for BiTraPNP is a combination of the goal L2 loss, the trajectory L2 loss and the KLdivergence loss between prior and recognition networks, written as
(4) 
where and are the predicted goal and trajectory waypoints with respect to current position .
3.4 Bidirectional NLL Loss for BiTraPGMM
Similar to [31], our BiTraPGMM models the pedestrian velocity distribution as a GMM at each time step. The velocity GMM is then integrated forward to obtain the GMM distribution of trajectory waypoints as shown by blue blocks in Fig. 2. We assume linear dynamics for pedestrian and use a single integrator as in Eq. (5). The loss function is then the summation of negative loglikelihood (NLL) of the ground truth future waypoints over the prediction horizon, formulated as
(5)  
(6) 
where are velocity GMM parameters at time , and the symbol indicates location GMM parameters obtained from integration. is the probability density function. Such an emphasizes earlier waypoints along the prediction horizon because a waypoint at time is used in integration results over , while these later waypoints are not used when computing . This goes against our proposed idea which is to leverage a bidirectional temporal model. Therefore, we compute bidirectional NLL loss with reverse integration from the goal, formulated as
(7)  
(8) 
where is the backward probability density function, the symbol indicates backward location GMM parameters. The final loss function for BiTraPGMM can be written as
(9) 
where the first term is loss of the goal estimation, and are computed from forward and backward integration, the term is the KLdivergence similar to Eq. (4).
4 Experiments and Results
In this section, we empirically evaluate BiTraPNP and BiTraPGMM models on both firstperson view (FPV) and bird’s eye view (BEV) trajectory prediction datasets. We also provide a comparative study and discussion on the effects of model and loss selection.
Datasets. Two FPV datasets, Joint Attention for Autonomous Driving (JAAD) [18] and Pedestrian Intention Estimation (PIE) [26], and two benchmark BEV datasets, ETH [25] and UCY [19], were used in our experiments. JAAD contains 2,800 pedestrian trajectories captured from dash cameras annotated at 30Hz. PIE contains 1,800 pedestrian trajectories also annotated at 30Hz, with longer trajectories and more comprehensive annotations such as semantic intention, egomotion and neighbor objects. ETHUCY datasets contain five subdatasets captured from downfacing surveillance cameras in four different scenes with 1,536 pedestrian trajectories annotated at 2.5Hz.
Implementation Details. We used the standard training/testing splits of JAAD and PIE as in [26]. A 0.5second (15 frame) observation length and 1.5second (45 frame) prediction horizon were used for evaluation. For ETHUCY, a standard leaveoneout approach based on scene was used per [11, 31]. We observed trajectories for 3.2 seconds (8 frames) and predicted the paths for the next 4.8 seconds (12 frames). We used hidden unit size 256 for all encoders and decoders in BiTraP across all datasets. All models were trained with batch size 128, learning rate (LR) 0.001, and an exponential LR scheduler [31] on a single NVIDIA TITAN XP GPU.
4.1 Experiments on JAAD and PIE Datasets
Baselines.
We compare our results against the following baseline models: 1) Linear Kalman filter, 2) Vanilla LSTM model, 3) BayesianLSTM model (BLSTM)
[3], 4) PIE, an attentive RNN encoderdecoder model, 5) PIE, a multistream attentive RNN model, by injecting egomotion and semantic intention stream to PIE, and 6) FOLX [39], a multistream RNN encoderdecoder model using residual prediction. We also conducted an ablation study for a deterministic variation of our model (BiTraPD), where the multimodal CVAE module was removed.Evaluation Metrics. Following [39, 26, 3], our BiTraP model was evaluated using: 1) bounding box Average Displacement Error (), 2) box center ADE () and 3) box center Final Displacement Error () in squared pixels. For our multimodal BiTraPNP and BiTraPGMM, we compute the bestof20 results (the minimum ADE and FDE from 20 randomlysampled trajectories), following [11, 31, 30]
. We also report the Kernel Density Estimationbased Negative Log Likelihood (KDENLL) metric for BiTraPNP and BiTraPGMM, which evaluates the NLL of the ground truth under a distribution fitted by a KDE on trajectory samples from each prediction model
[31, 34]. For all metrics, lower values are better.Results. Table 1 presents trajectory prediction results with JAAD and PIE datasets. Our deterministic BiTraPD model shows consistently lower displacement errors across various prediction horizons than baseline methods such as PIE and FOLX indicating our goal estimation and bidirectional prediction modules are effective. Our BiTraPD model, based only on past trajectory information, also outperforms the stateoftheart PIE, which requires additional egomotion and semantic intention annotations. Table 1 also shows that nonparametric multimodal method BiTraPNP performs better on displacement metrics while parametric method BiTraPGMM performs better on the metric. This difference illustrates the objectives of these methods: BiTraPNP generates diverse trajectories, and one trajectory was optimized to have minimum displacement error, while BiTraPGMM generates trajectory distributions with more similarity to the ground truth trajectory.
Methods  JAAD  PIE  
(0.5/1.0/1.5s)  (1.5s)  (1.5s)  (0.5/1.0/1.5s)  (1.5s)  (1.5s)  
Linear [26]  233/857/2303  1565  6111    123/477/1365  950  3983   
LSTM [26]  289/569/1558  1473  5766    172/330/911  837  3352   
BLSTM [3]  159/539/1535  1447  5615    101/296/855  811  3259   
FOLX [39]  147/484/1374  1290  4924    47/183/584  546  2303   
PIE [26]  110/399/1280  1183  4780    58/200/636  596  2477   
PIE [26]          //556  520  2162   
BiTraPD  93/378/1206  1105  4565    41/161/511  481  1949   
BiTraPNP (20)  38/94/222  177  565  18.9  23/48/102  81  261  16.5 
BiTraPGMM (20)  153/250/585  501  998  16.0  38/90/209  171  368  13.8 
Fig. 3 shows trajectory prediction results on sample frames from the PIE dataset. We observed that when a pedestrian intends to cross the street or change directions, the multimodal BiTraP methods yield higher accuracy and more reasonable predictions than the deterministic variation. For example, as shown in Fig. 2(b), the deterministic BiTraPD model (top row) can fail to predict the trajectory and the endgoal, where a pedestrian intends to cross the street in the future; the multimodal BiTraPNP model (bottom row) can successfully predict multiple possible future trajectories, including one where the pedestrian is crossing the street matching ground truth intention. Similar observations can be made in other frames. This result indicates multimodal BiTraPNP can predict multiple possible futures, which could help a mobile robot or a selfdriving car safely yield to pedestrians. Although BiTraPNP samples diverse trajectories, it still predicts distribution with high likelihood around ground truth targets and low likelihood in other locations per Fig. 2(b)2(d).
4.2 Experiments on ETHUCY Datasets
Baselines. We compare our methods with five multimodal baseline methods: SGAN [11], SoPhie [30], SBiGAT [17], PECNet [23] and Trajectron++ [31]. PECNet and Trajectron++ are most recent. PCENet is a goalconditioned method using nonparametric distribution (thus directly comparable to our BiTraPNP) while Trajectron++ uses a GMM trajectory distribution directly comparable to our BiTraPGMM. Note that all baselines incorporate social information while our methods fully focus on investigating trajectory modeling and do no require social information input.
Evaluation Metrics. Following [11, 23, 30], we used bestof20 trajectory ADE and FDE in meters as evaluation metrics. We also report Average and Final KDENLL (ANLL and FNLL) metrics as a supplement [34, 31] to evaluate the predicted trajectory and goal distribution.
Results. Table 2 shows the bestof20 ADE/FDE results across all methods. We observed that BiTraPNP outperforms the stateoftheart goal based method (PECNet) by a large margin (), demonstrating the effectiveness of our bidirectional decoder module. BiTraPNP also obtains lower ADE/FDE on most scenes ( improvement) compared with Trajectron++. Our BiTraPGMM model was trained using NLL loss, so it shows higher ADE/FDE results compared with BiTraPNP. This is consistent with our FPV dataset observations in Section 4.1. Nevertheless, BiTraPGMM still achieves similar or better results than PECNet and Trajectron++.
Datasets  SGAN [11]  SoPhie [30]  SBiGAT [17]  PECNet [23]  Trajectron++ [31]  BiTraPNP  BiTraPGMM 
ETH  0.81/1.52  0.70/1.43  0.69/1.29  0.54/0.87  0.43/0.86  0.37/0.69  0.40/0.74 
Hotel  0.72/1.61  0.76/1.67  0.49/1.01  0.18/0.24  0.12/0.19  0.12/0.21  0.13/0.22 
Univ  0.60/1.26  0.54/1.24  0.55/1.32  0.35/0.60  0.22/0.43  0.17/0.37  0.19/0.40 
Zara1  0.34/0.69  0.30/0.63  0.30/0.62  0.22/0.39  0.17/0.32  0.13/0.29  0.14/0.28 
Zara2  0.42/0.84  0.38/0.78  0.36/0.75  0.17/0.30  0.12/0.25  0.10/0.21  0.11/0.22 
Average  0.58/1.18  0.54/1.15  0.48/1.00  0.29/0.48  0.21/0.39  0.18/0.35  0.19/0.37 
To further evaluate predicted trajectory distributions, we report KDENLL results in Table 3. As shown, BiTraPGMM outperforms Trajectron++ with lower ANLL and FNLL on ETH, Univ, Zara1 and Zara2 datasets. On Hotel, Trajectron++ achieves lower NLL values which may be due to the possible higher levels of interpersonal interactions than in other scenes. We observed improved ANLL/FNLL on Hotel (1.88/0.27) when combining the BiTraPGMM decoder with the interaction encoder in [31], consistent with our hypothesis.
Datasets  SGAN [11]  Trajectron++ [13]  BiTraPNP  BiTraPGMM 
ETH  15.70/  1.31/4.28  3.80/3.79  0.96/3.55 
Hotel  8.10/  1.94/0.25  0.41/1.26  1.60/0.51 
Univ  2.88/  1.13/2.13  0.84/2.15  1.19/2.03 
Zara1  1.36/  1.41/1.83  0.81/1.85  1.51/1.56 
Zara2  0.96/  2.53/0.50  1.89/1.31  2.54/0.38 
We also computed KDENLL results for both Trajectron++ and BiTraPGMM methods at each time step to analyze how BiTraP affects both shortterm and longerterm (up to 4.8 seconds) prediction results. Per Fig. 4, BiTraPGMM outperforms Trajectron++ with longer prediction horizons (after 1.2 seconds on ETH, Univ, Zara1, and Zara2). This shows the backward passing from the goal helps reduce error with longer prediction horizon.
Fig. 5 shows qualitative examples of our predicted trajectories using the BiTraPNP and BiTraPGMM models. As shown, BiTraPNP (top row) generates future possible trajectories with a wider spread (more diverse), while BiTraPGMM generates more compact distributions. This is consistent with our quantitative evaluations as reported in Table 3, where the lower NLL results of BiTraPGMM correspond to more compact trajectory distributions. To intuitively present model performance in collision avoidance and robot navigation, we conducted a robot path simulation experiment on ETHUCY dataset and report collision related metrics in the supplementary material.
5 Conclusion
We presented BiTraP, a bidirectional multimodal trajectory prediction method conditioned on goal estimation. We demonstrated that our proposed model can achieve stateoftheart results for pedestrian trajectory prediction on both firstperson view and bird’s eye view datasets. The current BiTraP models, with only observed trajectories as inputs, already surpass previous methods which required additional egomotion, semantic intention, and/or social information. By conducting a comparative study between nonparametric (BiTraPNP) and parametric (BiTraPGMM) models, we observed that the different latent variable choice affects the diversity of target distributions of future trajectories. We hypothesized that such difference in predicted distribution directly influences the collision rate in robot path planning and showed that collision metrics can be used to guide predictor selection in real world applications. For future work, we plan to incorporate scene semantics and social components to further boost the performance of each module. We are also interested in using estimated goals and predicted trajectories to infer and interpret pedestrian intention and actions.
Acknowledgments
This work was supported by a grant from Ford Motor Company via the FordUM Alliance under award N028603. This material is based upon work supported by the Federal Highway Administration under contract number 693JJ319000009. Any options, findings, and conclusions or recommendations expressed in the this publication are those of the author(s) and do not necessarily reflect the views of the Federal Highway Administration.
References
 [1] (2016) Social lstm: human trajectory prediction in crowded spaces. In CVPR, Cited by: §1.
 [2] (2019) Stochastic sampling simulation for pedestrian trajectory prediction. arXiv preprint arXiv:1903.01860. Cited by: §1.
 [3] (2018) Longterm onboard prediction of people in traffic scenes under uncertainty. In CVPR, Cited by: §4.1, §4.1, Table 1.
 [4] (2018) Accurate and diverse sampling of sequences based on a “best of many” sample objective. In CVPR, Cited by: §1, §3.1, §3.3.
 [5] (2018) Anticipating many futures: online human motion prediction and generation for humanrobot interaction. In ICRA, Cited by: §1.
 [6] (2019) Drogon: a causal reasoning framework for future trajectory forecast. arXiv preprint arXiv:1908.00024. Cited by: §2.
 [7] (2018) Multimodal trajectory prediction of surrounding vehicles with maneuver based lstms. In IV, Cited by: §2.
 [8] (2020) Trajectory forecasts in unknown environments conditioned on gridbased plans. arXiv preprint arXiv:2001.00735. Cited by: §2.
 [9] (2019) Biolstm: a biomechanically inspired recurrent neural network for 3d pedestrian pose and gait prediction. IEEE Robotics and Automation Letters. Cited by: §1.
 [10] (2015) Recurrent network models for human dynamics. In ICCV, Cited by: §1, §1.
 [11] (2018) Social GAN: socially acceptable trajectories with generative adversarial networks. In CVPR, Cited by: §1, §1, §2, §4.1, §4.2, §4.2, Table 2, Table 3, §4.
 [12] (1995) Social force model for pedestrian dynamics. Physical review E 51 (5), pp. 4282. Cited by: §1.
 [13] (2019) The trajectron: probabilistic multiagent trajectory modeling with dynamic spatiotemporal graphs. In ICCV, Cited by: §1, §1, §2, §3.2, Table 3.
 [14] (2018) Generative modeling of multimodal multihuman behavior. In IROS, Cited by: §2, §3.1.
 [15] (2017) Realtime certified probabilistic pedestrian forecasting. IEEE Robotics and Automation Letters 2 (4), pp. 2064–2071. Cited by: §1.
 [16] (1960) A new approach to linear filtering and prediction problems. Journal of basic Engineering 82 (1), pp. 35–45. Cited by: §1.
 [17] (2019) Socialbigat: multimodal trajectory forecasting using bicyclegan and graph attention networks. In NIPS, Cited by: §2, §4.2, Table 2.
 [18] (2016) Joint attention in autonomous driving (JAAD). arXiv preprint arXiv:1609.04741. Cited by: §4.
 [19] (2014) Learning an imagebased motion context for multiple people tracking. In CVPR, Cited by: §4.
 [20] (2017) Desire: distant future prediction in dynamic scenes with interacting agents. In CVPR, Cited by: §1, §1, §2, §3.1, §3.3.
 [21] (2019) Gametheoretic modeling of multivehicle interactions at uncontrolled intersections. arXiv preprint arXiv:1904.05423. Cited by: §1.
 [22] (2019) Peeking into the future: predicting future person activities and locations in videos. In CVPR, Cited by: §1, §1.
 [23] (2020) It is not the journey but the destination: endpoint conditioned trajectory prediction. arXiv preprint arXiv:2004.02025. Cited by: §1, §1, §2, §3.1, §4.2, §4.2, Table 2.

[24]
(2019)
Learning regularity in skeleton trajectories for anomaly detection in videos
. In CVPR, Cited by: §1.  [25] (2009) You’ll never walk alone: modeling social behavior for multitarget tracking. In ICCV, Cited by: §4.
 [26] (2019) PIE: a largescale dataset and models for pedestrian intention estimation and trajectory prediction. In ICCV, Cited by: §1, §3.3, §3, §4.1, Table 1, §4, §4.
 [27] (2015) Goaldirected pedestrian prediction. In ICCVW, Cited by: §1, §2.
 [28] (2018) Pedestrian prediction by planning using deep neural networks. In ICRA, Cited by: §2.
 [29] (2019) Precog: prediction conditioned on goals in visual multiagent settings. In ICCV, Cited by: §1, §2.
 [30] (2019) Sophie: an attentive gan for predicting paths compliant to social and physical constraints. In CVPR, Cited by: §4.1, §4.2, §4.2, Table 2.
 [31] (2020) Trajectron++: multiagent generative trajectory forecasting with heterogeneous data for control. arXiv preprint arXiv:2001.03093. Cited by: §1, §1, §2, §3.2, §3.3, §3.4, §3, §4.1, §4.2, §4.2, §4.2, Table 2, §4.
 [32] (2014) Dynamic probabilistic drivability maps for lane change and merge driver assistance. IEEE Transactions on Intelligent Transportation Systems 15 (5), pp. 2063–2073. Cited by: §1.
 [33] (2015) Learning structured output representation using deep conditional generative models. In NIPS, Cited by: §1.
 [34] (2019) Analyzing the variety loss in the context of probabilistic trajectory prediction. In ICCV, Cited by: §4.1, §4.2.

[35]
(2006)
Gaussian processes for machine learning
. Vol. 2, MIT press Cambridge, MA. Cited by: §1.  [36] (2018) The smart black box: a valuedriven automotive event data recorder. In ITSC, pp. 973–978. Cited by: §1.
 [37] (2020) The smart black box: a valuedriven highbandwidth automotive event data recorder. IEEE Transactions on Intelligent Transportation Systems. Cited by: §1.
 [38] (2020) When, where, and what? a new dataset for anomaly detection in driving videos. arXiv preprint arXiv:2004.03044. Cited by: §1.
 [39] (2019) Egocentric visionbased future vehicle localization for intelligent driving assistance systems. In ICRA, Cited by: §1, §3, §4.1, §4.1, Table 1.
 [40] (2019) Unsupervised traffic accident detection in firstperson videos. In IROS, Cited by: §1.
Comments
There are no comments yet.