In most situations, experienced human drivers are able to properly infer future behaviors for the surrounding vehicles, which is critical when making tactical driving decisions such as overtaking or crossing an unsignalized intersection. This predictive ability is often lacking in current Advanced Driving Assistance Systems (ADAS) such as Adaptive Cruise Control (ACC), which usually act in a purely reactive fashion and leave tactical decision-making to the driver. On the other hand, fully autonomous vehicles lacking predictive capacities generally have to behave very conservatively in the presence of other traffic participants; as demonstrated by the low-speed collision between a self-driving car and a passenger bus , reliable motion prediction of surrounding vehicles is a critical feature for safe and efficient autonomous driving.
Many approaches to motion prediction have been proposed in the literature, and a survey can be found in 
. As in many machine learning applications, existing techniques can be split betweenclassification or regression methods. When applied to motion prediction, classification problems consist in determining a high-level behavior (or intention), for instance lane change left, lane change right or lane keeping for highway driving or turn left, turn right or go straight
in an intersection. Many techniques have already been explored for behavior prediction, such as hidden Markov models[3, 4]5]6, 7] or directly using a vehicle model ; more recently, artificial neural network approaches have also been proposed [9, 10, 11].
The main advantage of predicting behaviors is that the discrete outputs make it easier to train models and evaluate their performance. However, they only provide rough information on future vehicle states, which is not easy to use when planning a trajectory for the self-driving ego-vehicle. Some authors have proposed using a generative model, for instance Gaussian Processes  or neural networks 
based upon the output of the behavior prediction, but this approach requires multiple trainings and is only as robust as the classifier accuracy. Regression problems, on the other hand, aim at directly obtaining a prediction for future positions of the considered vehicle, which can then be used for motion planning. Many regression algorithms could be used for this problem, for instance regression forests; more recently, artificial neural networks have attracted the most attention in the field of trajectory prediction for cars [13, 14], cyclists  or pedestrians [16, 17]. A potential downside of such approaches is that the output of many regression algorithms is a single “point” (e.g., a single predicted trajectory) without providing a measure of confidence. To counter this issue, well-established approaches such as Monte Carlo sampling or k-fold validation 
can be used to provide error estimates; more recently, dropout estimation techniques have also been proposed for applications using neural networks.
In this article, we focus on trajectory prediction using long short-term memory (LSTM) neural networks 
, which are a particular implementation of recurrent neural networks. Because they are able to keep a memory of previous inputs, LSTMs are considered particularly efficient for time series prediction and have been widely used in the past few years for pedestrian trajectory prediction [16, 17] or to predict vehicle destinations at an intersection [11, 22]. Our main contribution is the design of an LSTM network to predict car trajectories on highways, which is notably critical for safe autonomous overtaking or lane changes, and for which very little literature exists.
A particular challenge for this problem is that highway driving usually comprises a lot of constant velocity phases with rare punctual events such as lane changes, which are therefore hard to learn correctly. For this reason, many authors rely on purposely recorded  or handpicked [23, 24] trajectory sets which are not representative of actual, average driving. Therefore, the real-world performance of trained models can be significantly different. A second contribution of this article is that we train and validate our model using the entire NGSIM US101 dataset  without a-priori selection, and show that we can predict future trajectories with a satisfying average RMS error below (laterally) and (longitudinally) when predicting ahead. To the best of our knowledge, no other published learning technique was demonstrated with similar results using a dataset representative of real-world conditions.
The rest of this article is structured as follows: in Section II, we define the trajectory prediction problem that we are aiming to solve. In Section III, we detail the preprocessing of the US101 dataset to extract the input features of the model, which is presented in Section IV. In Section V, we present the training procedure and outputs of the trained model. Finally, Section VI concludes the study.
Ii Problem statement
We consider the problem of predicting future trajectories of vehicles driving on a highway, using previously observed data; these predictions can then be used to plan the motion of an autonomous vehicle.
Formally, we consider a set of observable features
and a set of target outputsto be predicted. We assume that the features can all be acquired simultaneously at regular intervals, and that successive measurements are always available; we let and, for and , we denote by the value of feature observed time steps earlier. Similarly, we define and we denote by the value of output , time steps in the future. We use uppercase and
to respectively denote the tensors of the previously observed features and corresponding predicted outputs. We propose to use a machine learning approach, in which we train a regression functionsuch that the predicted outputs match the actual values as closely as possible.
In this article, our approach is to train a predictor for the future trajectory of a single “target” vehicle; in order to only use data which can realistically be gathered, we limit the amount of available information to the vehicles immediately around the target vehicle, as described in Section III. As with many learning approaches, one difficulty is to design models that are able to generalize well from the training data . A second difficulty, more specific to the problem of highway trajectory prediction, is the imbalance between constant velocity driving phases, which are much more frequent than events such as lane changes.
Iii Data and features
In this article, we use the Next Generation Simulation (NGSIM) dataset , collected in 2005 by the United States Federal Highway Administration, which is one of the largest publicly available source of naturalistic driving data and, as such, has been widely studied in the literature (see, e.g., [13, 9, 26, 11]). More specifically, we consider the US101 dataset which contains 45 minutes of trajectories for vehicles on the US101 highway, between 7:50am and 8:35am during the transition from fluid traffic to saturation at rush hour. In total, the dataset contains trajectories for more than 6000 individual vehicles, recorded at .
The NGSIM dataset provides vehicle trajectories in the form of coordinates of the front center of the vehicle in a global frame, and of local coordinates of the same point on a road-aligned frame. In this article, we use the local coordinates (dataset columns 5 and 6), where is the lateral position of the vehicle relative to the leftmost edge of the road, and its longitudinal position. Moreover, the dataset contains each vehicle’s lane identifier at every time step, as well as information on vehicle dimensions and type (motorcycle, car or truck). Finally, the data also contains the identifier of the preceding vehicle for every element in the set (when applicable).
Iii-B Data preparation
One known limitation of the NGSIM set is that vehicle positioning data was obtained from video analysis, and the recorded trajectories contain a significant amount of noise . Velocities, which are obtained from numerical differentiation, suffer even more from this noise. For this reason, we used a first order Savitzky-Golay filter  – which performs well for signal differentiation – with window length 11 (corresponding to a time window of ) to smooth the longitudinal and lateral positions and compute the corresponding velocities, as illustrated in Figure 6.
In this article, we hypothesize that the future behavior of a target vehicle can be reliably predicted by using local information on the vehicles immediately around it; a similar hypothesis was successfully tested in  to detect lane-change intent. For a target vehicle, we consider 9 vehicles of interest, that we label according to their relative position with respect to the target vehicle , as shown in Figure 7. By convention, we let (respectively ) be the vehicle which is closest to the target vehicle in a different lane with (respectively ). We respectively denote by , , and the vehicle preceding , , and ; similarly, vehicles , and are chosen so that their leader is respectively , and . During the data preprocessing phase, we compute the identifier of each vehicle of interest and perform join requests to append their information to the dataset. When such a vehicle does not exist, the corresponding data columns are set to zero.
Note that the rationale behind the inclusion of information on is that only observing the state of the vehicle directly in front is not always sufficient to correctly determine future traffic evolution. For instance, in a jam, knowing that vehicle is accelerating can help infer that , although currently stopped, will likely accelerate in the future instead of remaining stopped. The obvious limit to increasing the number of considered vehicles is the ability to realistically gather sufficient data using on-board sensors; for this reason, we restrict the available information to these 9 vehicles..
In this article, we aim at only using features which can be reasonably easily measured using on-board sensors such as GNSS and LiDAR, barring range or occlusion issues. For this reason, we consider a different set of features for the target vehicle (for which we want to compute the future trajectory) and for its surrounding vehicles as described above.
For the target vehicle, we define the following features:
local lateral position , to account for different behaviors depending on the driving lane,
local longitudinal position , to account for different behaviors when approaching the merging lane,
lateral and longitudinal velocities and ,
type (motorcycle, car or truck), encoded respectively as , or .
For each vehicle , we define the following features:
lateral velocity ,
longitudinal velocity relative to : ,
lateral distance from : ,
longitudinal distance from : ,
signed time-to-collision with : ,
type (motorcycle, car or truck), encoded respectively as , or .
These features are scaled to remain in an acceptable range with respect to the activation functions; in this article, we simply divide longitudinal and lateral distances (expressed in SI units), as well as longitudinal velocities by, which results in values generally contained within . Note that in the case of missing data (e.g., when the left vehicle does not exist), the corresponding values of can become higher (in absolute value).
This choice of features was made to replicate the information a human driver is likely to base its decisions upon: the features from surrounding vehicles are all relative to the target vehicle, as we expect drivers to usually make decisions based on perceived distances and relative speeds rather than their values in an absolute frame. Features regarding the target vehicle’s speed are given in a (road-relative) absolute frame as drivers are generally aware of speedometer information; similarly, we use road-relative positions since the driver is usually able to visually measure lateral distances from the side of the road, and knows its longitudinal position. The choice of explicitly including time-to-collision as a feature comes from the high importance of this metrics in lane-change decisions ; furthermore, neurosciences seem to indicate that animal and human brains heavily rely on time-to-collision estimations to perform motor tasks .
In this article, our goal is to predict the future trajectory of the target vehicle. Since the region of interest spans roughly longitudinally, the values of the longitudinal position can become quite large; for this reason, we prefer to predict future longitudinal velocities instead. Since the lateral position is bounded, we directly use for the output. In order to have different horizons of prediction, we choose a vector of outputs consisting in values taken seconds in the future.
Iv Learning model
Contrary to many existing frameworks for intent or behavior prediction, which can be modeled as classification problems, our aim is to predict future positions for the target vehicle, which intrinsically is a regression problem. Due to their success in many applications, we choose to use an artificial neural network for our learning architecture, in the form of a Long Short-Term Memory (LSTM) network 
. LSTMs are a particular implementation of recurrent neural networks (RNN), which are particularly well suited for time series; in this article, we used the Keras framework, which implements the extended LSTM described in , presented in Figure 8. Compared to simpler vanilla RNN implementations, LSTMs are generally considered more robust for long time series ; future work will focus on comparing the performance of different RNN approaches on our particular dataset.
An interesting feature of LSTM cells is the presence of an internal state which serves as the cell’s memory, denoted by in Figure 8. Based on a new input , its previous state and previous output , the cell performs different operations using so-called “gates”:
forget: uses the inputs to decide how much to “forget” from the cell’s previous internal state ;
input: decides the amount of new information to be stored in memory based on and ;
output: computes the new cell output from a mix of the previous states and output of the input gate.
This particular feature of LSTMs allows a network to learn long-term relations between features, which makes them very powerful for time series prediction.
Due to their recurrent nature, even a single layer of LSTM nodes can be considered as a “deep” neural network. Although such layers may theoretically be stacked in a fashion similar to convolutional neural networks to learn higher-level features, previous studies and our own experiments (see Section V) seem to indicate that stacked layers of LSTM do not provide improvements over a single layer in our application. In this article, we use the network presented in Figure 9 as our reference architecture, and we compare a few variations on this design in Section V
. The reference architecture uses a first layer of 256 LSTM cells, followed by two dense (fully connected) and time-distributed layers of 256 and 128 neurons and a final dense output layer containing as many cells as the number of outputs. In this simple architecture, the role of the LSTM layer is to abstract a meaningful representation of the input time series; these higher-level “features” are then combined by the two dense layers in order to produce the output, in this case the predicted future states.
Additionally, the first four input features of the network – corresponding to the absolute state of the target vehicle – are repeated and directly fed to the (dense) output layer, thus bypassing the LSTMs. The motivation behind this bypass is to allow the recurrent layer to focus on variations from the current states, rather than modeling the steady state of driving at constant speed on a given lane. In practice (see Section V), the use of this bypass seems to slightly improve prediction quality.
In this section, we use the previously described deep neural network to predict future trajectories sampled from the US101 dataset. To assess the learning performance of the model and its ability to generalize over different drivers, we first randomly select 80% of vehicles (4892 trajectories) for training, and withhold the remaining 20% of vehicles (1209 trajectories) for testing; these later 20% are not used during the training phase.
In this article, we aim at designing a network which is capable of understanding medium-term (up to
) relations for prediction. To avoid backpropagation-related issues that can arise with long time series, we trained the network using windows of 100 inputs, representing a total ofpast observations. One such window is taken every 10 data points; therefore, two consecutive windows have of overlap. Additionally, vehicles are grouped by batches of 500 (except for the final batch), and data is shuffled within batches. As a result, the data actually fed to the network for a batch of vehicles is a tridimensional tensor of shape where and
are respectively the total number of time windows in the batch, and the number of features. The training is performed on GPU using the TensorFlow backend with a batch size of 32; the model is trained for 5 epochs on each set of 500 vehicles and the whole dataset is processed 20 times, resulting in 100 effective epochs.
For the test set, we directly feed the input features for the whole trajectory, without processing the data by time windows. For each vehicle, we then compute the Root Mean Squared error (RMSE) between the network prediction and the actual expected value. In Figure 3, we present the prediction outputs of the network of Figure 9 for one of the vehicles in the test set. For comparison purposes, we tested the following variations of the reference design:
Reference design of Figure 9,
Using vehicle type information,
Without using information on vehicle ,
Without using a bypass,
Using bypass before the first dense layer (only bypass the LSTMs),
Using a linear activation for the dense layer,
Adding another LSTM layer after the first,
Adding a third dense layer of nodes;
Table I presents the average RMS error across all networks for various prediction horizons. In an effort to further improve accuracy, we used a light bagging technique consisting in using the average of the outputs from the four best models (denoted by a * in Table I); this bagged predictor almost always perform best over the testing data. For comparison purposes, we also report results from 
which chose a related approach using a multi-layer perceptron (which does not have a recurrent layer). The higher prediction errors for longer horizons seem to show that the use of LSTMs provides better results for longer prediction horizons.
provides the best overall results for lateral position prediction, but is less precise for velocity prediction. Interestingly, providing vehicle type information does not improve predictions of lateral movement but allows more precise forecasting of longitudinal speed, probably due to the difference in acceleration capacities. In what follows, we focus on this reference design to provide more insight on error characterization.Figure 12 presents the distribution of prediction error over the test set for the bagged predictor.
Note that the above results mostly use the RMSE and error distributions to evaluate the quality of prediction. However, such aggregated metrics may not be the best suited for this particular application, notably due to the over-representation of straight driving at constant speed, which highly outnumber discrete events such as lane changes or sudden acceleration. An illustration of this limitation is that we sometimes observe that the prediction reacts with a delay, such as shown in Figure 15; this effect mostly happens for longer prediction horizons, and is not properly accounted for using RMSE. In the worst cases (such as depicted in Figure 15), this delay can reach up to or for a prediction horizon of , thus demonstrating that the model is sometimes unable to interpret observed behaviors.
Experimentally, separately training each network output seems to yield better results, at the cost of an increased overall training time; training one model per vehicle type, or using wider networks could also be possible ways of improvement, as well as using different time windows durations for training. Besides providing improvement to the model, future work will focus on designing better suited metrics related to correct detection of meaningful traffic information, for instance lane changes, overtaking events or re-acceleration and braking during stop-start driving, which could help further improve predictions. Moreover, a more careful analysis of cases showing large deviations should be performed to compare model predictions with human-made estimations.
In this article, we proposed a neural network architecture based on LSTMs to predict future vehicle trajectories on highway using naturalistic driving data from the widely studied NGSIM dataset. This network was shown to achieve better prediction accuracy than the previous state-of-the-art, with an average RMS error of roughly for the lateral position in the future, and lower than for the longitudinal velocity with the same horizon. Contrary to many previous studies which used handpicked trajectories for training and testing, thus adding a selection bias, our results were obtained using the whole US101 dataset, which should make the model more apt to deal with real-world scenarios.
Although this work is highly preliminary and some limitations – notably the observed delayed response – should be addressed, we believe that these results constitute a promising basis to compute probable trajectories for surrounding vehicles. The use of the actual predictions alongside with precise statistics on error distribution could, in turn, be used to significantly improve current motion planning algorithms. Provided the discussed limitations can be overcome, our results open many perspectives for future research, first by studying their generalizability to other highways, then other driving scenarios such as in intersections and roundabouts. To this end, a proper study of selected features should be performed, both to determine a good inputs scaling technique and to study which features are the most relevant. Moreover, more in-depth investigation of error distribution using application-specific metrics should be performed to properly validate the proposed models. Finally, our current approach does not consider stochasticity or confidence levels, although this information is essential to correctly use the predictions; future work will investigate possible techniques to output probability distributions instead of single values.
-  Google, “Google self-driving car project monthly report,” https://www.google.com/selfdrivingcar/files/reports/report-0216.pdf, Tech. Rep., Feb. 2016.
-  S. Lefèvre, D. Vasquez, and C. Laugier, “A survey on motion prediction and risk assessment for intelligent vehicles,” ROBOMECH Journal, vol. 1, no. 1, p. 1, dec 2014.
-  C. Tay, K. Mekhnacha, and C. Laugier, “Probabilistic Vehicle Motion Modeling and Risk Estimation,” in Handbook of Intelligent Vehicles. Springer London, 2012, pp. 1479–1516.
-  T. Streubel and K. H. Hoffmann, “Prediction of driver intended path at intersections,” IEEE Intelligent Vehicles Symposium, Proceedings, pp. 134–139, 2014.
-  A. Carvalho, Y. Gao, S. Lefevre, and F. Borrelli, “Stochastic predictive control of autonomous vehicles in uncertain environments,” in 12th International Symposium on Advanced Vehicle Control, 2014.
-  H. M. Mandalia and M. D. D. Salvucci, “Using Support Vector Machines for Lane-Change Detection,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 49, no. 22, pp. 1965–1969, sep 2005.
-  P. Kumar, M. Perrollaz, S. Lefevre, and C. Laugier, “Learning-based approach for online lane change intention prediction,” in 2013 IEEE Intelligent Vehicles Symposium (IV). IEEE, jun 2013, pp. 797–802.
-  A. Houenou, P. Bonnifait, V. Cherfaoui, and Wen Yao, “Vehicle trajectory prediction based on motion model and maneuver recognition,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, nov 2013, pp. 4363–4369.
S. Yoon and D. Kum, “The multilayer perceptron approach to lateral motion prediction of surrounding vehicles for autonomous vehicles,” in2016 IEEE Intelligent Vehicles Symposium (IV), vol. 2016-August. IEEE, jun 2016, pp. 1307–1312.
-  A. Khosroshahi, E. Ohn-Bar, and M. M. Trivedi, “Surround vehicles trajectory analysis with recurrent neural networks,” in 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). IEEE, nov 2016, pp. 2267–2272.
-  D. J. Phillips, T. A. Wheeler, and M. J. Kochenderfer, “Generalizable Intention Prediction of Human Drivers at Intersections,” 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 1665–1670, 2017.
B. Volz, H. Mielenz, R. Siegwart, and J. Nieto, “Predicting pedestrian crossing using Quantile Regression forests,” in2016 IEEE Intelligent Vehicles Symposium. IEEE, jun 2016, pp. 426–432.
-  R. S. Tomar and S. Verma, “Safety of Lane Change Maneuver Through A Priori Prediction of Trajectory Using Neural Networks,” Network Protocols and Algorithms, vol. 4, no. 1, pp. 4–21, 2012.
-  Qiang Liu, B. Lathrop, and V. Butakov, “Vehicle lateral position prediction: A small step towards a comprehensive risk assessment system,” in 17th International IEEE Conference on Intelligent Transportation Systems (ITSC). IEEE, oct 2014, pp. 667–672.
-  S. Zernetsch, S. Kohnen, M. Goldhammer, K. Doll, and B. Sick, “Trajectory prediction of cyclists using a physical model and an artificial neural network,” in 2016 IEEE Intelligent Vehicles Symposium (IV). IEEE, jun 2016, pp. 833–838.
-  Yanjie Duan, Yisheng Lv, and Fei-Yue Wang, “Travel time prediction with LSTM neural network,” in 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). IEEE, nov 2016, pp. 1053–1058.
-  A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social LSTM: Human Trajectory Prediction in Crowded Spaces,” in
-  T. Fushiki, “Estimation of prediction error by using k-fold cross-validation,” Statistics and Computing, vol. 21, no. 2, pp. 137–146, 2011.
Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” inProceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 48. PMLR, 20–22 Jun 2016, pp. 1050–1059.
-  S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
-  F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to Forget: Continual Prediction with LSTM,” Neural Computation, vol. 12, no. 10, pp. 2451–2471, oct 2000.
-  A. Zyner, S. Worrall, J. Ward, and E. Nebot, “Long Short Term Memory for Driver Intent Prediction,” 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 1484–1489, 2017.
-  C. Ding, W. Wang, X. Wang, and M. Baumann, “A neural network model for driver’s lane-changing trajectory prediction in urban traffic flow,” Mathematical Problems in Engineering, vol. 2013, 2013.
-  J. Zheng, K. Suzuki, and M. Fujita, “Predicting driver’s lane-changing decisions using a neural network model,” Simulation Modelling Practice and Theory, vol. 42, pp. 73–83, 2014.
-  U.S. Federal Highway Administration. (2005) US Highway 101 dataset.
-  J. Morton and T. A. Wheeler, “Project Report Deep Learning of Spatial and Temporal Features for Automotive Prediction,” pp. 1–9, 2016.
-  M. Montanino and V. Punzo, “Making ngsim data usable for studies on traffic flow theory: Multistep method for vehicle trajectory reconstruction,” Transportation Research Record: Journal of the Transportation Research Board, no. 2390, pp. 99–111, 2013.
-  A. Savitzky and M. J. Golay, “Smoothing and differentiation of data by simplified least squares procedures.” Analytical chemistry, vol. 36, no. 8, pp. 1627–1639, 1964.
-  J. Schlechtriemen, A. Wedel, J. Hillenbrand, G. Breuel, and K.-d. Kuhnert, “A lane change detection approach using feature ranking with maximized predictive power,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings. IEEE, jun 2014, pp. 108–114.
-  J. Schlechtriemen, F. Wirthmueller, A. Wedel, G. Breuel, and K. D. Kuhnert, “When will it change the lane? A probabilistic regression approach for rarely occurring events,” IEEE Intelligent Vehicles Symposium, Proceedings, vol. 2015-August, pp. 1373–1379, 2015.
-  D. T. Field and J. P. Wann, “Perceiving time to collision activates the sensorimotor cortex,” Current Biology, vol. 15, no. 5, pp. 453–458, 2005.
-  F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015.