Log In Sign Up

Lifelong Vehicle Trajectory Prediction Framework Based on Generative Replay

by   Peng Bao, et al.

Accurate trajectory prediction of vehicles is essential for reliable autonomous driving. To maintain consistent performance as a vehicle driving around different cities, it is crucial to adapt to changing traffic circumstances and achieve lifelong trajectory prediction model. To realize it, catastrophic forgetting is a main problem to be addressed. In this paper, a divergence measurement method based on conditional Kullback-Leibler divergence is proposed first to evaluate spatiotemporal dependency difference among varied driving circumstances. Then based on generative replay, a novel lifelong vehicle trajectory prediction framework is developed. The framework consists of a conditional generation model and a vehicle trajectory prediction model. The conditional generation model is a generative adversarial network conditioned on position configuration of vehicles. After learning and merging trajectory distribution of vehicles across different cities, the generation model replays trajectories with prior samplings as inputs, which alleviates catastrophic forgetting. The vehicle trajectory prediction model is trained by the replayed trajectories and achieves consistent prediction performance on visited cities. A lifelong experiment setup is established on four open datasets including five tasks. Spatiotemporal dependency divergence is calculated for different tasks. Even though these divergence, the proposed framework exhibits lifelong learning ability and achieves consistent performance on all tasks.


page 1

page 11


Diversity-Aware Vehicle Motion Prediction via Latent Semantic Sampling

Vehicle trajectory prediction is crucial for autonomous driving and adva...

iTV: Inferring Traffic Violation-Prone Locations with Vehicle Trajectory and Road Environment Data

Traffic violations like illegal parking, illegal turning, and speeding h...

Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning

In autonomous driving (AD), accurately predicting changes in the environ...

A Generic Framework for Clustering Vehicle Motion Trajectories

The development of autonomous vehicles requires having access to a large...

GISNet: Graph-Based Information Sharing Network For Vehicle Trajectory Prediction

The trajectory prediction is a critical and challenging problem in the d...

I Introduction

In autonomous driving, an ability to predict surrounding vehicles’ future trajectories accurately is a key to make appropriate decision. Therefore, many research works contribute on vehicle kinematics and interactive modelling[50]. Recent works also adopt data-driven approaches[25, 39, 40, 11, 18, 7, 75]. Benefitted from these ingenious works, precision of vehicle trajectory prediction on various open datasets has been promoted significantly.

However, in real application, an intelligent vehicle equipped with an autonomous driving system is supposed to visit varied road sections, cities or even countries. To guide the vehicle safely, the system is required to adapt to heterogeneous distribution of surrounding vehicles’ motion and interaction pattern and predict their future trajectories accurately. For this purpose, the system needs to learn new knowledge about emerging traffic environments continuously without forgetting old ones. In addition, with limited storage resource, the system can not afford to store large amount of trajectory data. Coutinuous learning with limited storage resource to achieve good performance in all processed tasks is also called lifelong learning. Unfortunately, most existing vehicle trajectory prediction models are trained and tested specifically for each dataset, which fails to accomodate to other different datasets if applied directly.

A crucial problem impeding lifelong learning is catastrophic forgetting. In different traffic circumstances, interaction pattern among vehicles varies, which can be interpreted as spatiotemporal dependency divergence. The divergence makes a prediction model trained in old circumstances performs poorly in a new different circumstance. If trained in the new circumstances, the model would fit spatiotemporal dependency in the new one while forget that in the old circumstances, which makes it perform poorly in the old circumstances in turn. In summary, to perform consistently in all circumstances, the prediction model is required to learn new knowledges while not forgetting old ones.

Therefore, to measure spatiotemporal dependency divergence that arises from varied traffic circumstances, Mixture Density Networks (MDNs) are introduced to estimate conditional probability density function (CPDF) and conditional Kullback-Leibler divergence (CKLD) is computed through Monte-Carlo (MC) sampling. Then to realize lifelong trajectory prediction, a new framework based on conditional generative replay is proposed. The framework consists of two models, a generative memory and a task solver. The generative memory is designed to address catastrophic forgetting, a key challenge in lifelong learning. Through adversarial learning and distributions merging across different traffic circumstances, the generative memory replays trajectories conditioned on spatial configuration of vehicles. The replayed trajectories is used as an input to train a trajectory prediction model. As the generative memory is capable of generating trajectories within the same distribution of all recorded data, the prediction model achieves consistent performance on all processed tasks.

In summary, our contributions include:

  • An innovative spatiotemporal dependency divergence measurement method is developed for two trajectory datasets.

  • A novel research topic is revealed. Lifelong vehicle trajectory prediction is first proposed to promote adaptation and performance on heterogeneous tasks for autunomous driving system.

  • Based on generative replay, an initiate lifelong learning framework is introduced for vehicle trajectory prediction.

  • Experimental results on four different datasets have demonstrated that the proposed framework can achieve lifelong learning and realize satisfactory prediction performance on all processed tasks.

Ii Related work

Vehicle trajectory prediction is a longstanding research topic and is becoming more important as the development of autonomous driving. Kinematics models[5, 45] are established to predict surrounding vehicles’ future trajectories, which performs well in short horizon. For a longer prediction horizon over 3 seconds, vehicles’ intentions are important and have to be clarified first. Prototype classification[65] and intention recognition[6]

are two research directions. Through collecting and clustering of tremendous trajectories in advance, prototype classification methods match online history trajectories with that in database. Multimodel future trajectories can be generated once matched. Intention recognition methods classify drivers’ potential maneuvers into limited classes. Combining heuristic information including road topology, traffic signal and vehicle turn signal, classification models such as Support Vector Machines (SVMs)


and Hidden Markov Models (HMMs)

[61] are introduced to recognize surrounding drivers’ intentions. However, drivers’ behaviours are affected by various factors and highly personalized[72], which makes it hard to be recognized accurately in real application. Besides, a vehicle’s trajectory is not determined by its driver’s intention only but affected interactively.

To make an accurate prediction, it’s necessary to consider vehicles’ motion as a dynamic interactive system and model interactive impact among vehicles. Pairwise interactive modelling is of high complexity. Deo Nachiket[15]

simplified interactive factor as a cost function that penalizes vehicle collision to filter out collide future trajectory pairs. Data-driven approaches are another practical methods and have been digged extensively. In these approaches, vehicles’ sequential features are learned through Long-Short Term Memory Networks (LSTMs). To learn interactive factor, Convolutional Neural Networks (CNNs) or max-pooling module is introduced. Deo

[16] divided certain surrounding area into grid cells and use CNNs to model spatial relationship among vehicles. Li[40] builded an adjacent matrix of surrounding vehicles where each elements represents pairwise proximity. A CNN is utilized to learn interactive factor. Messaoud[46]

discretized traffic environment into 3D grid cells and a Relational Recurrent Neural Network (RRNN) is used to predict future trajectory. Gupta


utilized a pooling module to extract dense interactive features among vehicles. More recent works urge to merge data-driven and knowledge-driven approaches into a unified neural network

[18, 7].

However, although prediction precision is improved continually, generalization and adaptation of proposed models still remain an open problem. Majority of research works validate proposed methods on one open dataset only. Some works validate on more datasets but train and test models for each dataset individually, which is not consistent with real application. Pushing forward to real application, a novel prediction model is urged to be proposed to cope with various traffic circumstances. Two questions arise naturally. How to measure difference among heterogeneous circumstances? And how to fulfill consistent vehicle trajectory prediction performance on emerging circumstances and visited ones, which is also called lifelong learning?

Divergence measurement of heterogeneous traffic circumstances is an open problem. To measure trajectory similarity, various methods are proposed, such as euclidean distance (ED)[58], dynamic time warping (DTW)[47], longest common subsequence (LCSS)[54], merge distance (MD)[27], and spatiotemporal locality in-between polylines (STLIP) distance [49], etc. Su[62]

made a survey on 15 widely used trajectory distance measures in the literature. It can be deduced that ED is suitable for vehicle trajectories distance measurement that have the same total length and sample frequency. However, these similarity measurement methods consider one trajectory with another trajectory each time, while we aim to measure differences of traffic circumstances where dynamic number of trajectories are presented. In different traffic circumstances, vehicles’ interaction pattern changes and the way affecting future motion varies. In other words, spatiotemporal dependency between future and past motion differs, which is essentially a conditional probability density function (CPDF) alteration problem. Therefore, a more reasonable method is to estimate distance between two unknown CPDF with empirical samples only.

Estimation distance of two unknown CPDF with samples only is challenging. As a commonly used probability divergence measurement method, KL divergence can not work without analytic CPDF. The Donsker-Varadhan variational formula [17, 9] can be utilized to estimate CKLD empirically. However, it suffers from convergence problem for large divergence between two CPDFs, which is usually the case in real data. nearest neighbor[68] is another approch to approximate CKLD. It requires distance calculation of condition data for two datasets, which is not commited for traffic circumstances with dynamic number of vehicles. As a conditional extension of traditional Maximum Mean Discrepancy (MMD)[22], conditional MMD (CMMD)[60]

is proposed to measure distance between two CPDFs. Similar with MMD, CMMD measures embedding probabilities distance in reproducing kernel Hilbert space (RKHS). In MMD, a PDF is mapped into a point in RKHS, while in CMMD a CPDF is a family of points with different conditions. Therefore, CMMD is averaged for distances with different condition, which implies samples conditioned on a same condition. For two traffic circumstances in our work, condition data can not guaranteed to be the same. In fact, CMMD is usually used as training loss function

[53, 52] of neural networks where predicted and real value can be obtained on the same condition. Another method to measure PDF distance empirically is optimal transport. Esteban[64]

proposes a data-driven conditional optimal transport (COT) method. The COT represents empirical CPDF distance computation as a optimal transport problem constrained by CPDF alignment. CKLD is utilized to interpret the constraint and then converted into KL divergence between joint distributions through chain rule

[13]. By using Donsker-Varadhan variational formula and Lagrange multiplier, the constrained COT is relaxed into a minimax optimization problem that can be optimized empirically. However, the COT prones to local minimum and the minimax game is hard to converge. Moreover, for high dimension data as in our work, it is difficult to identify local minimum.

Lifelong learning aims to solve a series of tasks incrementally[14]. When addressing a new task, small amount or none data of old tasks are stored. After the final task is presented, all tasks should be solved by one task model with good performance. Key to lifelong learning is avoiding catastrophic forgetting of old tasks’ knowledge when updating the task model to solve a new task. From an aspect of model training, approches to mitigate catastrophic forgetting are classified into three categories, architectural, regularization and rehearsal strategies. Architectural strategies train different models or subnetworks for incoming new tasks. A selector is used to choose an appropriate model or subnework for a task. Typical research works includes Progressive Neural Network (PNN)[57], Incremental Learning through Deep Adaptation (DAN)[55], Copy Weight with Re-init (CWR)[43], etc. These methods preserve performances of old tasks while confilcting with storage limitation of lifelong learning. Regularization strategies extend loss functions with additional term to retain performances of old tasks. Learning without Forgetting (LwF)[41] and Elastic Weight Consolidation (EWC)[31] are two representative methods. LwF proposes to use outputs of old models as soft targets to substitute data of old tasks, which is reported to suffer a buildup drop in old tasks’ performance as the task sequence grows longer[4]. EwC evaluates importance of parameters for old tasks and adds a penalty to changes when training on new tasks, which pays more attention to preserving the knowledge on old tasks but prevents the model from achieving competitive performance on new tasks[73]. Rehearsal strategies generally use an external memory to store part of old data[26, 51] or patterns[44]. As storage is limited and Generative Adversarial Networks (GANs) develope, generative replay[59] is proposed as a memory of previous data and its feasibility has been validate on several works[63, 36, 37, 42, 38]. Although quality of the generation model is a bottleneck, many works[66, 28, 70, 42, 63, 71] have proved that an elaborately designed generation model practically outperforms mainstream lifelong methods such as EWC, LwF, MAS[3], PathNet[19], and iCaRL[51] et al.

In this paper, CPDF distance between two traffic circumstances are calculated first to reveal spatiotemporal dependency divergence. Then a generative replay based lifelong trajectory prediction framework is proposed to enhance generalization and adaptation over different traffic environment. As a key of alleviate catastrophic forgetting, a generation model is realized through a novel conditional GAN (CGAN), which is called Recurrent Regression GAN (R2GAN). Through merging different generation models trained on different tasks, the generation model finally learns all knowledge involved in processed tasks. Eventually, a trajectory prediction model trained on generated data performs well on all tasks. A task chain including five tasks that stem from four open datasets is used as lifelong setup. Experiments on the task chain demonstrate effectiveness of proposed framework.

The rest of this paper is organized as follows. In section 3, a mathematic formulation for lifelong trajectory prediction is addressed. Divergence between two traffic circumstances is measured first in section 4. Then generative replay based lifelong prediction framework is introduced in section 5 in detail. In section 6, evaluation experiments are performed to evaluate quantitatively. Finally, conclusion and future work are introduced in section 6.

Iii Problem Formulation

Formally, lifelong vehicle trajectory prediction is characterized by a set of tasks

to be learned by a parameterized model. In this work, with unsupervised learning nature of trajcetory prediction, task data

have training samples where . Target vehicle and surrounding vehicles’ trajectories lasting for are regarded as history information to train the parameterized model to predict future trajectory of the target vehicle. In a traffic circumstance involving vehicles, spatiotemporal dependency is formulated as a CPDF where represents future trajectory of the target vehicle and represents history trajectories of all vehicles. Samples are drawn i.i.d from an unknown distribution associcated with task . Distribution can be different from each other for different . In lifelong learning, task data are observed sequentially and when the next data arrive, data are abandoned completely of only kept partly in a limited storage. Ultimately, the prediction model can predict accurately in all tasks after observing all task data.

Iv Divergence measurement of different traffic circumstances

As an effective divergence measuremnt method, KL divergence is extended to CKLD to measure spatiotemporal dependency difference of two traffic circumstances, which is formulated as


The CKLD can not be computed without analytic formulation of . Therefore, parameters of GMMs are estimated by a MDN to approximate first and then MC sampling can be performed to calculate CKLD.

Iv-a Dimension normalization for dynamic traffic circumstances

In a dynamic traffic circumstances involving vehicles, condition should be represented as where represents sequential coordinate of vehicle that lasts , which possesses dynamic dimension. To facilitate model learning, a fixed dimension is preferred. Notice that target vehicle’s future motion is affected by limited number of neighboring vehicles, it is reasonable to consider closest vehicles only, which is also a common practice in trajectory prediciton research[40, 35]. To represent interactive relationship between considered vehicles, a Laplacian matrix is calculated. Being different from usual 3D case[12]

, a 2D Laplacian matrix is calculated through weighting on time dimension. Then eigen vectors corresponding to the biggest

eigen values are concatenated with target vehicle’s history trajectory, which forms a condition vector with fixed dimension where the superscript represents the target vehicle. The Laplacian matrix is calculated through


where is ED between vehicle and at time and is a decay parameter.

Iv-B Estimation on GMMs based on a MDN

To calculate CKLD between two CPDFs, GMMs are introduced to approximate CPDF as , where

is number of gaussian distribution hypothesis and

. For , mixing coefficient , mean

, and variance

are estimated through MDN[10, 56]

. As shown in Fig.1, a Multi-Layer Perceptron (MLP) is applied for input

to obtain a feature encoding . Then three seperative Fully Connected (FC) layers are utilized to calculate parameters of GMMs. To enforce , a softmax function is applied , where represents a FC layer and the subscript and represents vector component. Means are unconstrained. Variances should be positive. A softplus function is applied hence . Training loss function for MDN is

Fig. 1: MDN architecture.

Iv-C Calculation of CKLD through Monte-Carlo sampling

After GMMs are estimated for each condition , CKLD can be computed. As in (1), for every sample condition on , KLD can be calculated as


Although KL divergence between two GMMs is not analytically attractable, some techniques are developed to estimate effectively. Hershey[24] compared 5 methods and concluded that MC sampling reaches clearly the best accuracy. Suppose samples are sampled from , then can be calculated as


The complete computation flow is summarized in Algorithm 1.

Sample pairs and .
Calculate Laplacian matrix according to (2) and normalize condition to uniform dimension.
for  do
     Fit a MDN with and loss function .
end for
for  do
     Fit a MDN with and loss function .
end for
for  do
     Sampling ,
     Calculate KLD according to (4).
end for
return CKLD
Algorithm 1 CKLD between two traffic circumstances

V Generative Replay based lifelong trajcetory prediction

Based on our previous research on trajectory generation[8], a novel trajectory generation model trained by the standard GAN[20] loss is proposed to memorize data distribution of tasks. With a vehicle trajectory prediction model, a lifelong vehicle trajectory prediction framework is realized.

V-a Generator Conditioned on Relative Position Configuration

In a vehicle trajectory prediction task, we need to predict a target vehicle’s future trajectory according to its history trajectory and its neighboring vehicles’ histories. To facilitate generation process, a full prediction scenario is required to be generated that consists of target vehicle’s and its neighboring vehicles’ trajectories lasting for horizon.

Being different from traditional GANs that model generation procedure as mapping from a prior probability distribution to a target one, we are inspired by Quant GAN

[69] and model prediction scenario generation as mapping between stochastic processes. Gaussian process with RBF kernel is selected as prior stochastic process. For single vehicle, Gaussian process samplings are obtained, which constitutes for total interactive vehicles. Conditional GANs are easier to train and make generated samples more controllable. Therefore, multiple vehicles’ spatial configuration condition is utilized as conditional inputs for our generator, where represents condition for vehicle . Spatial configuration condition and input sampling are encoded by two MLPs individually first. To map sequential feature of Gaussian process into target data, a bidirectional GRU is utilized, where initial hidden states and are set by encoded position condition[29, 67]. In the bidirectional GRU, forward code of agent and backward code are averaged to form sequential code

. A MLP is attached later to encode spatial relations. A fully connected (FC) layer is used later with tanh activation function to output spatiotemporal data

of agents. Complete framework is shown in Fig.2.

Let represents a GRU and a MLP unit, the generation procedure can be formulated as:


for agent

Fig. 2: Proposed generator framework.

V-B MLP Based Regression Discriminator

As generated prediction scenario data are used for vehicle trajectory prediction task, it is of significant importance to maintain sequential dependences in generated samples. For this, a regression discriminator is first proposed by us to distinguish multiple agents’ real data from generated one. The regression discriminator learns to model joint distribution of inputs and outputs in a prediction task. Specifically, distribution of target vehicle’s history data, target vehicle’s future data and neighboring vehicles’ history data are taken as inputs of the regression discriminator and modelled jointly. The regression discriminator outputs a classification probability that indicates degree of true.

Architecture of a regression discriminator is shown in Fig.3. For vehicle , trajectory data are pre-processed through


After centering and normalisation pre-processing, target vehicle data are separated from neighboring agents’ data. A MLP is applied to the target vehicle to encode its history and future data into and specifically. Relative difference between the target vehicle and neighboring vehicles are calculated and encoded into through a MLP with a mean pooling layer, which is invariant to neighboring vehicles’ sequence. All codes are concatenated and encoded by two FC layers to get a feature vector . Finally, another FC layer is applied to the feature vector to obtain classification probalility .

Computation workflow can be formalized as:

Fig. 3: Proposed regression discriminator framework.

V-C Evolution of Generation Model

In a lifelong task chain , generation models are required to be merged to a long-term model when a new task arrives. In general, there are two fusion methods. As illustrated in Fig.4, one method[59, 36] merges generation model trained by task with the new task . Generated samples from long-term model and real samples drawn from are combined as real samples to train a new generation model , which we call Longterm-Data-Merge(LDM) method. Another method[28, 63] trains a temporal generation model for the new task . Generated samples from long-term model and temporal model are combined as real samples to train a new long-term model , which we call Longterm-Temporal-Merge(LTM) method. Although LDM performas better than LTM method intuitively, they are both applied to our lifelong task and compared.


Fig. 4: Two methods of generation model evolving.

V-D Task model for vehicle trajectory prediction

To perform vehicle trajectory prediction task with generated prediction scenario, a prediction model is proposed. As with mainstream trajectory prediction methods, history information and interactive relationship between target vehicle and neighboring vehicles are utilized to predict future trajectory of the target vehicle. Overall architecture is shown in Fig.5. First, target vehicle’s trajectory is separated from neighboring vehicles and encoded by a LSTM layer. Then, difference between target vehicle and others are encoded by a MLP and a mean pooling layer. These two parts are concatenated and encoded by a MLP further. A LSTM layer and a MLP is used to output predicted trajectories .

Let represents a LSTM unit, then for a prediction workflow, we have

Fig. 5: Proposed vehicle trajectory prediction model.

Vi Experiments and Analyses

All experiments are realized via Pytorch

[48]. Running environment is Ubuntu 16.04, Intel Core i9-9900X CPU, GeForce GTX 1080 Ti, and 64GB RAM. All code including CKLD calculation and lifelong learning experiments and pre-processed data are available online111

Vi-a Dataset and generation model setup

To construct a lifelong task chain, five sub-datasets recorded in different locations are selected from four open datasets. Some traffic circumstances are illustrated in Fig.6.

Fig. 6: Illustration of traffic circumstances in a) , b) , c) , d) , and e) . Blue dash line indicates target vehicle. Red lines indicate neighboring vehicles.
  • NGSIM dataset[2]. The NGSIM dataset contains two sub-datasets, US101 dataset and I801 dataset that are recorded on southbound US 101 and eastbound I-80 specifically. As tremendous vehicle trajectories are recorded, it’s time consuming to learn them all. Therefore, 7:50 a.m. to 8:05 a.m. trajectory records in US101 dataset are selected and named with . To simulate a case happened in real application where drives visit same place at different time period, 8:05 a.m. to 8:15 a.m trajectory records in US101 dataset are selected as and named with . Without losing generality, we keep full prediction scenarios that contains 4 and 5 vehicles only to ease the learning burden furtherly and over 188k items still remains. I801 dataset are refined as the same way and over 129k items remains. The selected datasets are separated into training, validation and testing dataset by 7:1:2.

  • HighD dataset[32]. The highD dataset is a new dataset of naturalistic vehicle trajectories recorded at six different locations on Germany highways, which results in sixty recordings. Considering learning burden, the 20th recording is selected and pre-processed, which results in 88k items left for training. It is noted that as trajectories are recorded in 10HZ in NGSIM while 25HZ in HighD dataset, we aim to generate trajectories in 5HZ. As recording items is not comparable with NGSIM dataset in scale, full prediction scenarios that contains 2 to 9 vehicles are kept. For ease of representation, this dataset is called .

  • Interaction dataset[74]. The interaction dataset contains naturalistic motions of various traffic participants in a variety of highly interactive driving scenarios from different countries. Trajectories in DR_CHN_Merging_ZS map is a lane merging dataset in China urban area. To be comparable with other datasets in scale, five trajectories records in the map are merged into a dataset named , which consists of 126k items.

Therefore, a lifelong task chain is formed by above five datasets


Vi-B Divergence between datasets

To measure CKLD between two datasets, we fix maximum vehicle number to and only top eigen vectors are extracted. Gaussian hypothesis number of GMMs is set into . The MDN is optimized through adam[30] optimizer with learning rate and batch size . CKLD results of pairwise datasets are presented in Table I.

dataset 1CKLDdataset 2
0 22.14 503.22 98.43 16.72
29.00 0 756.03 98.57 24.44
88.46 53.87 0 142.26 121.42
75.05 63.54 652.00 0 62.50
19.57 17.09 511.17 93.49 0
TABLE I: CKLD between datasets

In Table I, CKLD between and is the closest, which is expected as they are collected in the same highway at different time period. Divergence between and other datasets are quite large because only is colllected in Germany. The same goes for .

Vi-C Validation of trajectory prediction model

We compare proposed the task model performance with other benchmark trajectory prediction models.

  • Constant Velocity(CV). A constant velocity Kalman filter reported in


  • GAIL-GRU. A generative adversarial imitation learning model described in


  • LSTM with fully connected social pooling (S-LSTM). This uses the fully connected social pooling described in [1] and generates a unimodal output distribution.

  • LSTM with convolutional social pooling (CS-LSTM). This uses convolutional social pooling and generates a unimodal output distribution[16]. As designing an extrodinary prediction model is not our main goal, we use results reported in [16].

PH(s) 1 2 3 4 5
CV 0.73 1.78 3.13 4.78 6.68
GAIL-GRU 0.69 1.51 2.55 3.65 4.71
S-LSTM 0.65 1.31 2.16 3.25 4.55
CS-LSTM 0.61 1.27 2.09 3.10 4.37
Ours 0.55 1.28 2.18 3.30 4.64
TABLE II: RMSE(m) comparison of proposed task model and other common methods on complete NGSIM dataset.

To present a fair comparison, full NGSIM dataset is used to demonstrate validity of proposed prediction model, which is different from lifelong setups. RMSE performance for varied prediction horizon (PH) is given out in Table II. Suppose batch size predicted future trajectories at time are calculated and real future trajectories are available, then RMSE at time is


From Table II, it can be concluded that the task model possess similar prediction ability with mainstream methods. As we are not aiming to significantly improve prediction accuracy on single dataset but instead improve generality and lifelong prediction ability over multiple tasks, the performance of proposed prediction model may not compatible with state-of-the-art methods.

Vi-D Lifelong trajectory prediction

In R2GAN, a generator takes inputs as several Gaussian process samplings and label conditions of trajectories that indicate relative position to a target vehicle. For example, -1 indicates a vehicle is located on the left lane of the target vehicle at the beginning. The generator outputs corresponding vehicle trajectory snippets lasting for 8 seconds, i.e. 41 steps. For a regression discriminator, real or generated prediction scenario are classified as real or fake. R2GAN is trained by adam[30] optimizer with learning rate . Non-saturating GAN loss function is applied as in[21]. To demonstrate lifelong prediction ability of proposed framework, three other methods are realized and compared with our approach.

  • Generative replay based trajectory prediction(GRTP). This is the proposed lifelong trajectory prediction model based on generative replay. Resulted from two fusion methods LDM and LTM, the GRTP is furtherly classified into GRTP-D and GRTP-T specifically.

  • Joint training(JT). Joint training violates essential storage limitation and assumes that all data are available. This is regarded as the best possible performance over any lifelong learning methods.

  • Fixed model(FM). A trajectory model trained by task will not be adjusted anymore and will be applied to new tasks directly.

  • Fine tuning(FT). A trajectory model is trained while new task data are available. This is a possible choice but is expected to forget everything about old tasks. From some perspective, research works on trajctory prediction model design and optimization can be categoried into this method, although they did not test the model trained on new dataset on old ones.

The RMSE plots through the full lifelong task chain is illustrated in Fig.7, where local RMSE around is zoomed in by a mini plot. As the lifelong task chain proceeds from to , prediction performance on future 5 seconds horizon is validated on more tasks. Exact numeric result after addressing is given out in Table III. From experiment results, we can see that

  • it is obvious FT forgets old knowledge while attaining new knowledge. Indeed, FT is a common practice when new task arrives and performs well if divergence between old and new task is small. From CKLD computation result, divergence between , , and is relatively small. Therefore, FT performs well on these three tasks after . On the contrary, CKLD between and , is relatively large, which results in poor performance on and after tuning on task . The same consequence of applying FT to new task is also remarkable in Fig.7.

  • FM is trained on only. As a result, good performance on and is attainable after while large RMSE is observed on other tasks.

  • The proposed GRTP-T and GRTP-D perform well consistently over all tasks and possess close RMSE to JT. As JT stores all task data and can be considered as the best possible performance in lifelong task chain, we can conclude that GRTP mitigates catastrophic forgetting and realizes lifelong learning whether with LDM or LTM fusion method.

  • Although intuitively thinking, GRTP-D will outperforms GRTP-T as it merges longterm generation model with new data directly and avoids training a new generation model on new data, which avoids distribution learning bias introduced by the temporal generation model. However, it can be observed in Table III and Fig.7 that no significant performance gap exists between them. This observation demonstrates that minor or even no distribution bias is introduced by our proposed R2GAN, which validates effectiveness of proposed R2GAN and model merging method implicitly.

Fig. 7: RMSE for full lifelong task chain.
PH(s) 1 2 3 4 5
JT 0.93 1.86 3.30 5.14 7.30
FM 0.79 1.82 3.18 4.94 7.06
FT 0.79 1.63 2.86 4.49 6.53
GRTP-D 0.80 1.80 3.19 5.06 7.32
GRTP-T 0.81 1.86 3.29 5.08 7.25
JT 1.03 2.15 3.53 5.09 6.80
FM 1.08 2.56 4.54 6.91 9.57
FT 1.09 2.09 3.48 5.26 7.34
GRTP-D 1.00 2.25 3.79 5.61 7.67
GRTP-T 0.99 2.10 3.49 4.97 6.85
JT 1.46 1.66 2.97 4.94 7.01
FM 1.36 3.30 6.06 9.24 12.25
FT 1.32 3.21 6.49 10.01 13.81
GRTP-D 0.81 1.47 2.68 4.66 7.07
GRTP-T 0.70 1.88 3.41 4.93 6.66
JT 0.36 0.96 1.89 2.95 4.09
FM 0.44 1.28 2.34 3.55 4.86
FT 0.79 1.32 2.30 3.69 5.33
GRTP-D 0.57 1.35 2.33 3.56 4.89
GRTP-T 0.47 1.25 2.25 3.46 4.82
JT 0.71 1.48 2.70 4.21 5.92
FM 0.65 1.52 2.65 4.07 5.67
FT 0.62 1.31 2.32 3.63 5.20
GRTP-D 0.62 1.47 2.64 4.15 5.90
GRTP-T 0.63 1.54 2.75 4.24 5.96
TABLE III: RMSE(m) comparison of different models after finishing lifelong task chain.

Vii Conclusion

Maintaining consistent performance on vehicle trajectory prediction over different traffic circumstances is of significant importance for safe driving. Lifelong trajectory prediction is first addressed by us. Key problem hidering lifelong learning is catastrophic forgetting which arises from existence of spatiotemporal dependency divergence. To analysis the divergence between different traffic circumstances, CKLD is calculated based on GMMs approximation and MC sampling. Then a R2GAN is developed to generate dynamic number of vehicles in traffic circumstances, which guarantees inherent spatiotemporal dependency through a novel regression discriminator. Two methods are applied to construct a lifelong R2GAN model, LTM and LDM. LTM merges generated trajetories from long-term R2GAN and temporal R2GAN to train a new long-term generation model. LDM takes trajectories generated from long-term R2GAN and sampled from real dataset as real data to update the long-term generation model. Different spatiotemporal dependency are remembered by the long-term generation model that capable of generating samples from all processed tasks, thus mitigating catastrophic forgetting problem. Both merging method are validated through a constructed lifelong task chain and fulfill lifelong trajectory prediction task with consistent performance.

In this work, four datasets are appplied to verify effectiveness of proposed lifelong framework. In future work, more diversified traffic circumstances can be introduced to improve lifelong prediction performance.

Viii Acknowledgment

This work is supported by the National Natural Science Foundation of China (Grant No. 91848111, No. 62103393).


  • [1] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese (2016) Social lstm: human trajectory prediction in crowded spaces. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 961–971. Cited by: 3rd item.
  • [2] V. Alexiadis, J. Colyar, J. Halkias, R. Hranac, and G. McHale (2004) The next generation simulation program. Institute of Transportation Engineers. ITE Journal 74 (8), pp. 22. Cited by: 1st item.
  • [3] R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars (2018)

    Memory aware synapses: learning what (not) to forget

    In Proceedings of the European Conference on Computer Vision (ECCV), pp. 139–154. Cited by: §II.
  • [4] R. Aljundi, P. Chakravarty, and T. Tuytelaars (2017) Expert gate: lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375. Cited by: §II.
  • [5] S. Ammoun and F. Nashashibi (2009) Real time trajectory prediction for collision risk estimation between vehicles. In 2009 IEEE 5th International Conference on Intelligent Computer Communication and Processing, pp. 417–422. Cited by: §II.
  • [6] G. S. Aoude, V. R. Desaraju, L. H. Stephens, and J. P. How (2011) Behavior classification algorithms at intersections and validation using naturalistic data. In 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 601–606. Cited by: §II.
  • [7] M. Bahari, I. Nejjar, and A. Alahi (2021) Injecting knowledge in data-driven vehicle trajectory predictors. arXiv preprint arXiv:2103.04854. Cited by: §I, §II.
  • [8] P. Bao, Z. Chen, J. Wang, and D. Dai (2022) Multiple agents’ spatiotemporal data generation based on recurrent regression dual discriminator gan. Neurocomputing 468, pp. 370–383. External Links: ISSN 0925-2312, Document, Link Cited by: §V.
  • [9] M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio, A. Courville, and D. Hjelm (2018) Mutual information neural estimation. In

    International Conference on Machine Learning

    pp. 531–540. Cited by: §II.
  • [10] C. M. Bishop (1994) Mixture density networks. Cited by: §IV-B.
  • [11] R. Chandra, U. Bhattacharya, A. Bera, and D. Manocha (2019) Traphic: trajectory prediction in dense and heterogeneous traffic using weighted interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8483–8492. Cited by: §I.
  • [12] R. Chandra, T. Guan, S. Panuganti, T. Mittal, U. Bhattacharya, A. Bera, and D. Manocha (2020)

    Forecasting trajectory and behavior of road-agents using spectral clustering in graph-lstms

    IEEE Robotics and Automation Letters 5 (3), pp. 4882–4890. Cited by: §IV-A.
  • [13] T. M. Cover (1999) Elements of information theory. John Wiley & Sons. Cited by: §II.
  • [14] M. Delange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars (2021) A continual learning survey: defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §II.
  • [15] N. Deo, A. Rangesh, and M. M. Trivedi (2018) How would surround vehicles move? a unified framework for maneuver classification and motion prediction. IEEE Transactions on Intelligent Vehicles 3 (2), pp. 129–140. Cited by: §II.
  • [16] N. Deo and M. M. Trivedi (2018) Convolutional social pooling for vehicle trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1468–1476. Cited by: §II, 1st item, 4th item.
  • [17] M. D. Donsker and S. R. S. Varadhan (1983) Asymptotic evaluation of certain markov process expectations for large time. iv. Communications on Pure and Applied Mathematics 36 (2), pp. 183–212. External Links: Document, Link, Cited by: §II.
  • [18] A. Dulian and J. C. Murray (2021)

    Multi-modal anticipation of stochastic trajectories in a dynamic environment with conditional variational autoencoders

    arXiv preprint arXiv:2103.03912. Cited by: §I, §II.
  • [19] C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra (2017) Pathnet: evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734. Cited by: §II.
  • [20] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial networks. arXiv preprint arXiv:1406.2661. Cited by: §V.
  • [21] I. Goodfellow (2016) Nips 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160. Cited by: §VI-D.
  • [22] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola (2006) A kernel method for the two-sample-problem. Advances in neural information processing systems 19, pp. 513–520. Cited by: §II.
  • [23] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi (2018) Social gan: socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2255–2264. Cited by: §II.
  • [24] J. R. Hershey and P. A. Olsen (2007) Approximating the kullback leibler divergence between gaussian mixture models. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, Vol. 4, pp. IV–317. Cited by: §IV-C.
  • [25] L. Hou, L. Xin, S. E. Li, B. Cheng, and W. Wang (2019) Interactive trajectory prediction of surrounding road users for autonomous driving using structural-lstm network. IEEE Transactions on Intelligent Transportation Systems PP (99), pp. 1–11. Cited by: §I.
  • [26] S. Hou, X. Pan, C. C. Loy, Z. Wang, and D. Lin (2018) Lifelong learning via progressive distillation and retrospection. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 437–452. Cited by: §II.
  • [27] A. Ismail and A. Vigneron (2015) A new trajectory similarity measure for gps data. In Proceedings of the 6th ACM SIGSPATIAL International Workshop on GeoStreaming, pp. 19–22. Cited by: §II.
  • [28] N. Kamra, U. Gupta, and Y. Liu (2017) Deep generative dual memory network for continual learning. arXiv preprint arXiv:1710.10368. Cited by: §II, §V-C.
  • [29] A. Karpathy and L. Fei-Fei (2015) Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3128–3137. Cited by: §V-A.
  • [30] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §VI-B, §VI-D.
  • [31] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114 (13), pp. 3521–3526. Cited by: §II.
  • [32] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein (2018) The highd dataset: a drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 2118–2125. External Links: Document Cited by: 2nd item.
  • [33] A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer (2017) Imitating driver behavior with generative adversarial networks. In 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 204–211. Cited by: 2nd item.
  • [34] P. Kumar, M. Perrollaz, S. Lefevre, and C. Laugier (2013) Learning-based approach for online lane change intention prediction. In 2013 IEEE Intelligent Vehicles Symposium (IV), pp. 797–802. Cited by: §II.
  • [35] D. Lee, Y. Gu, J. Hoang, and M. Marchetti-Bowick (2019) Joint interaction and trajectory prediction for autonomous driving using graph neural networks. arXiv preprint arXiv:1912.07882. Cited by: §IV-A.
  • [36] T. Lesort, H. Caselles-Dupré, M. Garcia-Ortiz, A. Stoian, and D. Filliat (2019) Generative models from the perspective of continual learning. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §II, §V-C.
  • [37] T. Lesort, A. Gepperth, A. Stoian, and D. Filliat (2019) Marginal replay vs conditional replay for continual learning. In International Conference on Artificial Neural Networks, pp. 466–480. Cited by: §II.
  • [38] H. Li, W. Dong, and B. Hu (2020) Incremental concept learning via online generative memory recall. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §II.
  • [39] J. Li, H. Ma, and M. Tomizuka (2019) Interaction-aware multi-agent tracking and probabilistic behavior prediction via adversarial learning. In 2019 International Conference on Robotics and Automation (ICRA), pp. 6658–6664. Cited by: §I.
  • [40] X. Li, X. Ying, and M. C. Chuah (2019) Grip: graph-based interaction-aware trajectory prediction. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 3960–3966. Cited by: §I, §II, §IV-A.
  • [41] Z. Li and D. Hoiem (2017) Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40 (12), pp. 2935–2947. Cited by: §II.
  • [42] X. Liu, C. Wu, M. Menta, L. Herranz, B. Raducanu, A. D. Bagdanov, S. Jui, and J. v. de Weijer (2020) Generative feature replay for class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 226–227. Cited by: §II.
  • [43] V. Lomonaco and D. Maltoni (2017) Core50: a new dataset and benchmark for continuous object recognition. In Conference on Robot Learning, pp. 17–26. Cited by: §II.
  • [44] D. Lopez-Paz and M. Ranzato (2017) Gradient episodic memory for continual learning. arXiv preprint arXiv:1706.08840. Cited by: §II.
  • [45] P. Lytrivis, G. Thomaidis, and A. Amditis (2008) Cooperative path prediction in vehicular environments. In 2008 11th International IEEE Conference on Intelligent Transportation Systems, pp. 803–808. Cited by: §II.
  • [46] K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi (2019) Relational recurrent neural networks for vehicle trajectory prediction. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 1813–1818. Cited by: §II.
  • [47] M. Müller (2007) Dynamic time warping. Information retrieval for music and motion, pp. 69–84. Cited by: §II.
  • [48] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019)

    Pytorch: an imperative style, high-performance deep learning library

    Advances in neural information processing systems 32, pp. 8026–8037. Cited by: §VI.
  • [49] N. Pelekis, I. Kopanakis, G. Marketos, I. Ntoutsi, G. Andrienko, and Y. Theodoridis (2007) Similarity search in trajectory databases. In 14th International Symposium on Temporal Representation and Reasoning (TIME’07), pp. 129–140. Cited by: §II.
  • [50] S. Qiao, N. Han, J. Wang, R. H. Li, L. A. Gutierrez, and X. Wu (2017) Predicting long-term trajectories of connected vehicles via the prefix-projection technique. IEEE Transactions on Intelligent Transportation Systems, pp. 1–11. Cited by: §I.
  • [51] S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert (2017) Icarl: incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010. Cited by: §II.
  • [52] C. Ren, P. Ge, D. Dai, and H. Yan (2019)

    Learning kernel for conditional moment-matching discrepancy-based image classification

    IEEE transactions on cybernetics. Cited by: §II.
  • [53] Y. Ren, J. Zhu, J. Li, and Y. Luo (2016) Conditional generative moment-matching networks. Advances in Neural Information Processing Systems 29, pp. 2928–2936. Cited by: §II.
  • [54] M. T. Robinson (1990) The temporal development of collision cascades in the binary-collision approximation. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 48 (1-4), pp. 408–413. Cited by: §II.
  • [55] A. Rosenfeld and J. K. Tsotsos (2018) Incremental learning through deep adaptation. IEEE transactions on pattern analysis and machine intelligence 42 (3), pp. 651–663. Cited by: §II.
  • [56] J. Rothfuss, F. Ferreira, S. Walther, and M. Ulrich (2019) Conditional density estimation with neural networks: best practices and benchmarks. arXiv preprint arXiv:1903.00954. Cited by: §IV-B.
  • [57] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell (2016) Progressive neural networks. arXiv preprint arXiv:1606.04671. Cited by: §II.
  • [58] A. C. Sanderson and A. K. Wong (1980) Pattern trajectory analysis of nonstationary multivariate data. IEEE Transactions on Systems, Man, and Cybernetics 10 (7), pp. 384–392. Cited by: §II.
  • [59] H. Shin, J. K. Lee, J. Kim, and J. Kim (2017) Continual learning with deep generative replay. arXiv preprint arXiv:1705.08690. Cited by: §II, §V-C.
  • [60] L. Song, K. Fukumizu, and A. Gretton (2013) Kernel embeddings of conditional distributions: a unified kernel framework for nonparametric inference in graphical models. IEEE Signal Processing Magazine 30 (4), pp. 98–111. Cited by: §II.
  • [61] T. Streubel and K. H. Hoffmann (2014) Prediction of driver intended path at intersections. In 2014 IEEE Intelligent Vehicles Symposium Proceedings, pp. 134–139. Cited by: §II.
  • [62] H. Su, S. Liu, B. Zheng, X. Zhou, and K. Zheng (2020) A survey of trajectory distance measures and performance evaluation. The VLDB Journal 29 (1), pp. 3–32. Cited by: §II.
  • [63] X. Su, S. Guo, T. Tan, and F. Chen (2019) Generative memory for lifelong learning. IEEE transactions on neural networks and learning systems 31 (6), pp. 1884–1898. Cited by: §II, §V-C.
  • [64] E. G. Tabak, G. Trigila, and W. Zhao (2021) Data driven conditional optimal transport. Machine Learning, pp. 1–21. Cited by: §II.
  • [65] Q. Tran and J. Firl (2014) Online maneuver recognition and multimodal trajectory prediction for intersection assistance using non-parametric regression. In 2014 IEEE Intelligent Vehicles Symposium Proceedings, pp. 918–923. Cited by: §II.
  • [66] G. M. Van de Ven and A. S. Tolias (2018) Generative replay with feedback connections as a general strategy for continual learning. arXiv preprint arXiv:1809.10635. Cited by: §II.
  • [67] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan (2015) Show and tell: a neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164. Cited by: §V-A.
  • [68] Q. Wang, S. R. Kulkarni, and S. Verdú (2006) A nearest-neighbor approach to estimating divergence between continuous random vectors. In 2006 IEEE International Symposium on Information Theory, pp. 242–246. Cited by: §II.
  • [69] M. Wiese, R. Knobloch, R. Korn, and P. Kretschmer (2020) Quant gans: deep generation of financial time series. Quantitative Finance 20 (9), pp. 1419–1440. Cited by: §V-A.
  • [70] C. Wu, L. Herranz, X. Liu, Y. Wang, J. Van de Weijer, and B. Raducanu (2018) Memory replay gans: learning to generate images from new categories without forgetting. arXiv preprint arXiv:1809.02058. Cited by: §II.
  • [71] Y. Xiang, Y. Fu, P. Ji, and H. Huang (2019) Incremental learning using conditional adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6619–6628. Cited by: §II.
  • [72] L. Xu, J. Hu, H. Jiang, and W. Meng (2015) Establishing style-oriented driver models by imitating human driving behaviors. IEEE Transactions on Intelligent Transportation Systems 16 (5), pp. 2522–2530. Cited by: §II.
  • [73] X. Yao, T. Huang, C. Wu, R. Zhang, and L. Sun (2019) Adversarial feature alignment: avoid catastrophic forgetting in incremental task lifelong learning. Neural computation 31 (11), pp. 2266–2291. Cited by: §II.
  • [74] W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kümmerle, H. Königshof, C. Stiller, A. de La Fortelle, and M. Tomizuka INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps. arXiv:1910.03088 [cs, eess]. Cited by: 3rd item.
  • [75] A. Zyner, S. Worrall, and E. Nebot (2019) Naturalistic driver intention and path prediction using recurrent neural networks. IEEE transactions on intelligent transportation systems 21 (4), pp. 1584–1594. Cited by: §I.