Introduction
Forecasting the motion of surrounding vehicles is very critical for the navigation of robots and autonomous vehicles. Imagine that if a surrounding vehicle has an intention to cut in your lane, it is necessary to estimate its trajectory in the following few seconds as a priori for planning a nonintersection path as precisely as possible. In order to improve the safety and efficiency of the autonomous vehicles, a welldesigned autonomous driving system should have the ability to predict the future traffic scenes over a relatively longterm horizon.
Generally, forecasting the future trajectory of a vehicle is very challenging. Because it is typically an issue of the complex system which is affected by the variable weather, the complex traffic scenes, the different driving styles of human drivers and even the massive interaction between neighbor vehicles. Recall the previous example that supposes your feedback strategy is to speed up your vehicle in order to prevent your neighbor to cut in your lane, it will dramatically change his original intention in a sudden, and he will then recast his driving path soon. This is a simple traffic interaction scene but mathematically modeling of it is not an easy task.
In spite of the challenges above, we exploit the following two considerations in our model:

Socially Aware Interactive Effects: The socially aware interactive effect is cased by the social force which is a measure of the internal motivation of an individual. It is purposed by D.Helbing et al. [Helbing and Molnar1995] to study the motion trajectories of a group of pedestrians. In our model, we assume the dynamic motion of vehicles in complex traffic is affected by the socially aware interaction between neighbor vehicles.

Dynamic Uncertainties: Forecasting trajectories of robots or autonomous vehicles is intensely entangled with linear or nonlinear dynamic models since of uncertainties. In our model, the neural network is formulated in a dynamic modelbased approach that a filter layer is embedded into.
Inspired by the recent literature of artificial intelligence with the application in the complex system and nonlinear dynamic model
[Langford, Salakhutdinov, and Zhang2009, Krishnan, Shalit, and Sontag2015], we propose a neural network architecture called Socially Aware Kalman Neural Networks (SAKNN). SAKNN is a deep neural network with an embedding filter layer which has the following three advantages:
Robustness: The robustness is attributed to the powerful ability of extracting complex abstractions of deep neural network [Deng, Yu, and others2014], especially the multiple convolution layers in front of SAKNN, to resolve a hierarchy of concepts of massive surrounding sensor data and extract unique features from the data of traffic and vehicle dynamics, while human experts almost impossibly can.

Effectiveness: After a relatively longterm horizon, SAKNN overwhelms all the baselines in the mean square error (MSE) sense which is attributed to an embedding filter structure. Actually, we embed a filter layer into SAKNN as an approximation structure of the Kalman filter (KF). In training, the filter layer recursively bends the weights of the neural network to reach the optimal estimation of the underlying states in the MSE sense.

Smoothness: In the light of the embedding filter structure of SAKNN, we set a smoothing regularization to bend the weights of the neural network in which we directly execute the Kalman smoother backwardly on the predicted trajectory for smoothing.
In the experimental evaluation, we apply SAKNN to the Next Generation Simulation (NGSIM) data set [Colyar and Halkias2007] and the empirical results demonstrate that SAKNN predicts more precisely in a relatively longterm horizon and significantly improves the signaltonoise ratio of the predicted signal.
Related Work
We adopt the categories of methods of the trajectory prediction purposed by S.Lefevre et al. [Lefèvre, Vasquez, and Laugier2014] in which the abstraction of their classifications is gradually increasing, i.e. physicsbased motion model, maneuverbased motion model and interactionaware motion model. Actually, SAKNN is a combination of physicsbased motion model and interactionaware motion model.
Dynamic Uncertainties
The uncertainties need to be taken into account for modeling of the trajectory prediction more precisely. Typically, the KF [Kalman1960] is applied to model this uncertainty and recursively estimate the states of robots or autonomous vehicles from noisy sensor measurements. For a more complete review of state estimation, we refer the reader to standard references [Bishop, Welch, and others2001, Thrun, Burgard, and Fox2005, Simon2006]. In recent literature, the neural network approach for state estimation has been explored largely [Bobrowski et al.2008, Wilson and Finkel2009]. H.Coskun et al.[Coskun et al.2017]
train triple long shortterm memory (LSTM)
[Hochreiter and Schmidhuber1997] to learn the motion model, the process noise, and the measurement noise respectively in order to estimate the human pose in a camera image. T.Haarnoja et al. [Haarnoja et al.2016] purpose a discriminative approach to state estimation in which they train a neural network to learn features from highly complex observations and then filter with the learned features to output underlying states.Socially Aware Model
Interaction layer takes into account the influences of neighboring pedestrians or vehicles. Comparing with the traditional dynamic modelbased approach of modeling these dynamic features, we mainly refer to the recent work in the neural network approach. A.Alahi [Alahi et al.2016]
purposes a deep neural network model to predict the motion dynamics of pedestrians in crowded scenes, in which they build a fully connected layer called social pooling to learn the social tensor and the interaction between each pedestrian. However, this approach is designed for the motion of the pedestrians and not effective in complex traffic configurations. Another work of motion prediction is from
[Deo and Trivedi2018]. In their work, they extract social tensor by a convolutional social pooling layer and then feed the social tensor to a maneuver based model of trajectory prediction. A special version of the prediction model is applying imitation learning approach to learn human driver behavior in the multiagent setting
[Bhattacharyya et al.2018, Kuefler et al.2017]. The learned policies are able to generate the future driving trajectories which are better to match those of human drivers and can also interact with neighbors in a more stable manner over long horizons in the multiagent setting.Background
Notation
In the following sections, we use alphabets , , T to represent the Kalman filter starting time, the Kalman filter terminal time and the Kalman prediction terminal time respectively. And, the subscript s:t represents a history of data or states from Time s to Time t. We also use to represent the weights of the neural networks Interaction, LSTM and LSTM in our architecture.
Formulation of Our Problem
We consider the basic kinematic motion formula. Let be the position at Time t in the trajectory of the vehicles. The motion formula is as follows,
(1) 
where we use big notation to simply represent the higher order term of Taylor’s expansion of position. Our approach aims to predict trajectories of vehicles in two sequential steps. The first step is to predict acceleration generated by our neural network SAKNN whose input is the sensor information from perception space which mainly includes a history of positions, velocities, accelerations and social data; In the second step, we integral the predicted acceleration with respect to kinematic motion formula (1).
In the next section, we create a new model for tracking this setting. And, in the rest of this section, we provide a small overview of the Kalman filter and smoother which play an important role in our model. The KF is an optimal state estimator in the MSE sense with a linear dynamic model and Gaussian noise assumptions. Suppose we denote the state as , the control as and observation as , the KF is written as the process model and the measurement model
where the matrices , and are constant and process noise and measurement noise
are covariance matrices of zero mean Gaussian white noise with respect to time.
The KF adopts a recursive way to estimate underlying states of the above equation in two steps. Typically, the prediction and the update phases alternate until the observations are unavailable. In the prediction step, the current states and the error covariance matrix is estimated by the process model which is independent of current measurement
In the update step, once received the current observations, Kalman gain , the prior estimation and the error covariance matrix is calculated by the measurement model,
After the KF, we backwardly carry out the recursive steps of Kalman smoother as follows
where the new symbol and are the smoothed states and the smoothed error covariance matrix.
Model
In this section, we present our model of Socially Aware Kalman Neural Network (SAKNN). See Figure 1 for the graph of the detailed architecture.
Architecture of Interaction Layer
SAKNN takes into account a slightly influence of social force between vehicles and simulate such dynamical features. In contrast to the HelbingMolnarFarkasVicsek social force model [Helbing, Farkas, and Vicsek2000], social force between vehicles is the intention of drivers not to collide with the others or static obstacles.
According to Newton’s 2nd Law in physics, acceleration is directly proportional to force and thus we decide to pick acceleration to be the output of SAKNN. Thus, the interaction operator is from the perception space to the control space , i.e.
The architecture of the interaction layer refers to Figure 2. Specifically, the input of interaction layer is a history of the recorded velocities , positions , accelerations , widths and lengths of vehicles, and distances between two vehicles and the interaction layer is written as
Inspired by the work of Helbing et al. [Helbing and Molnar1995], we also feed the handcrafted repulsive interaction force
to our neural network, where the subscript and represent two adjacent vehicle and vehicle .
MultiAgent Kalman filter and smoother
We denote interaction layer as Interaction, process noise neural network as LSTM and measurement noise neural network as LSTM.
The diagram of multiagent KF of Nvehicle motion refers to Figure 3 and we write it in a recursive way as follows,
where
represents surrounding data; The state vector
and measurement vector consist of positions and velocities respectively written asThe motion matrix and control matrix are determined by motion formula and written as
And, process noise matrix obeys Gaussian and measurement noise matrix also obeys Gaussian .
The prediction step in this Nagent Kalman filter is then written as
And, the update step of Kalman filter is written as
where matrix is the output of Interaction, and are the diagonal matrix of the output of LSTM and LSTM module respectively.
Finally, the recursive step of Kalman smoother is written as
where is smoothing matrix and and are the smoothed state and the smoothed error covariance matrix.
Architecture for Noise Model
To model the sequential data, LSTM [Hochreiter and Schmidhuber1997] and GRU [Cho et al.2014] are the most popular deep neural network architectures till now. These two gatedstructure models overwhelm all models on the metric of recognition accuracy in quite a few fields e.g. natural language text compression, unsegmented connected handwriting recognition, and speech recognition [Graves et al.2009, Graves, Mohamed, and Hinton2013].
In SAKNN, we train two gatedstructure neural networks to learn the timevarying sensor noise in filter layer, i.e.
Process model noise covariance: LSTM;
Measurement model noise covariance: LSTM.
Loss Function
The loss in SAKNN is set for three goals, to smooth the recorded predicted states in the KF, to fit the predicted acceleration of network and constrain of future predicted states in the KF.

Smoothing of predicted states: At each time step of the KF, the filter layer outputs a predicted state . We collect the predicted states and smooth them by the Kalman smoothing backward step. We enlarge the smoothed trajectory and raw sensor data in order to bend the weights of the neural network. The smoothing loss is defined as,
where is the Kalman smoothing operator.

Fitting of accelerations: At each time step of the KF, the interaction layer will produce current and future acceleration. We fit the sign of predicted acceleration and its corresponding sign of the ground truth in L2 norm. In practice, the ground truth of acceleration is directly from raw sensor data .
where we take an expectation of the fitting loss to normalize the effects of a big number of Kalman timestep results.

The Penalty of prediction: After an integration of acceleration with respect to the kinematic motion formula (1), we give a penalty in L2 norm to the predicted trajectory in order to constrain the final displacement. representing the kinematic motion formula starts from observation state and iterates with the predicted acceleration . In practice, the ground truth of positions is directly from raw sensor data .
where we take an expectation of the penalty loss to normalize the effects of a big number of Kalman timestep results.
Experiment
Dataset
We evaluate our approach on public datasets US Highway 101 (US101) [Colyar and Halkias2007] and Interstate 80 (I80) [Lu and Skabardonis2007] of NGSIM program. NGSIM program collects detailed vehicle trajectory data on southbound US101 and eastbound I80 with a software application called NGVIDEO to transcribe the vehicle trajectory data from the video. The statistics of raw data of US101 and I80 are shown in Table 1.
Dataset  Study area  lanes  Time span  Sampling rate 

US101  2,100 feet  6  315 min  10 Hz 
I80  1,640 feet  6  315 min  10 Hz 
The training set and testing set are composed of 100,000 frames of raw data extracted from the NGSIM data set. 70% of them are for training and the rest for testing. In particular, we use the raw data for training but filter it for the evaluation. We extract raw data from NGSIM in the following way. We align raw data by the timestamp and group vehicles by one host and five surrounding vehicles on the adjacent traffic lanes. We then set 12 seconds as a window size of experiments in which the first 5 second’s trajectory is for a track history and the rest is for the prediction horizon.
Evaluation Metrics
In this subsection, we purpose multiple metrics to verify the effectiveness, the robustness and the smoothness of SAKNN. In particular, we filter the raw data from NGSIM to be the ground truth.

Effectiveness: The two regular metrics in trajectory prediction are average displacement error (ADE) and final displacement error (FDE). ADE is the root mean squared prediction error accumulated by the displacement error between predicted position and the ground truth in each time step. And, FDE takes into account the final displacement error. A small ADE or FDE means that the model predicts well.

Robustness: We test SAKNN on the general data set and the lane changing data set. In each data set, we compute ADE and FDE to evaluate the robustness. A robust model is corresponding to a small number in both ADE and FDE.

Smoothness: We use signaltonoise ratio (SNR) as a metric to quantitatively compute how the smoothness of our predicted signal is. SNR is a measure to compare the level of the underlying signal and the level of background noise.
Baselines
We compare the following baselines with SAKNN.

Constant Velocity (CV): The basic kinematic motion formula with a constant velocity.

VanillaLSTM (VLSTM): Referring to [Park et al.2018], VLSTM feeds in track history and generates the future trajectory sequence.

AccelerationLSTM (ACCLSTM): We purpose it to compare with SAKNN in which LSTM takes the track history and predicts the future acceleration. The predicted trajectory is accumulated by the predicted acceleration according to the basic kinematic motion formula.

Convolutional Social PoolingLSTM (CSLSTM): CSLSTM is a maneuver based model generating a multimodal predictive distribution [Deo and Trivedi2018].
Results
Figure 4 shows the effectiveness and robustness of SAKNN in both general and lane changing data set. We train SAKNN in almost 500 independent trails and over 300 training episodes in each case until it converges. And, we take an average of ADEs and FDEs of all vehicles to compare with the results of multiple baselines, since SAKNN predicts the future trajectories of a group of six vehicles together. The average of the performance of six vehicles in SAKNN is called Average SAKNN.
Hyperparameters of baselines are finetuned to obtain a good score on the validation sets. Another two stateoftheart algorithms of trajectory prediction Social Attention LSTM [Vemula, Muelling, and Oh2017] and Social LSTM [Alahi et al.2016] is taken into account, but they are designed for the motion of the pedestrians and thus not effective in the complex traffic configurations. Actually, our results differ one order of magnitude, and we decide not to put their results with SAKNN in our figure.
In Figure 4, the Average SAKNN is a little better or slightly worse than the ACCLSTM and CSLSTM algorithms in a shortterm prediction (1 to 4 seconds). However, the Average SAKNN is superior to all others after 4 seconds and the advantages increase heavily as time propagates. We also observe that the Average SAKNN performs well both in general and lane changing data set. The observations verify that the Average SAKNN indeed has merits in the effectiveness and the robustness, better than baselines. In particular, the Average SAKNN reduces 30% to 60% displacement errors of the constant velocity model in any situations.
In particular, we pick the best performance of six vehicles in SAKNN to be the Best SAKNN. The vehicle which is called Best SAKNN varies in different data sets but overwhelms the baselines substantially.
Qualitative Analysis and Discussion
In this section, we will qualitatively analyze the deep relationship between performance and structure of SAKNN.

Sensitivity of Hyperparameters in Architecture
: SAKNN has tens of hyperparameters from the interaction layer to the Kalman layer. The experimental results show that the coefficients of the three loss terms in the loss function will influence our results. In practice, we take into account the orders of magnitude of three loss terms and prefer to give fitting term importance.

Sensitivity of Hyperparameters in Filtering: In the evaluation step, we smooth the trajectories from NGSIM by the SavitzkyGolay filter with some window size as a hyperparameter. In fact, ADE and FDE are truly sensitive to this hyperparameter in the experiments. However, suppose we fix any window size for the filtering of the trajectories, SAKNN is always superior to other baselines in performance.

Effects of the Interaction Layer and the Kalman Layer in SAKNN: The interaction Layer provides a relatively underlying acceleration with low variance Gaussian white noise. And, the Kalman layer is indeed effective for smoothing the predicted acceleration in a slight degree and further improve the signaltonoise ratio in experiments. See Table 2. Actually, the effect of the Kalman layer in SAKNN is not inferior to it of the Kalman filter and smoother in robotics.
Weights Acceleration x (dB) Acceleration y (dB) Ground Truth 1.0753 1.4390 0.0/0.9/0.1 5.9033 5.4834 0.1/0.8/0.1 2.8446 7.4998 0.2/0.7/0.1 11.0210 15.8976 0.3/0.6/0.1 7.9355 16.7044 0.4/0.5/0.1 12.9703 10.0643 Table 2: SignaltoNoise Ratio of SAKNN: The weight column includes the smoothing/fitting/penalty weights. We randomly pick experiment samples from US101 data set and compute the average SNR of the samples for the future 20 time steps. We raise the smoothing weight in each experiment and find that the average SNR is increasing quickly but irregularly. The results in the table demonstrate the Kalman layer and smoothing term has effects in our model. 
Predictive Ability of Regression Model
: A big doubt in our approach is if it is possible to learn acceleration with any architecture of the neural network. The experimental results demonstrate a lack of patterns of the predicted acceleration after enough long time for training. We then create a heuristic method to learn the sign of acceleration. The new method turns the regression problem to be a classification problem and much tractable. Furthermore, we divide the range of acceleration and put them into range boxes which cover the whole range of acceleration of all vehicles. Then, we set
to be the loss of predicted range box number with the box number of the ground truth.
Conclusions and Future Work
In this work, we purpose a new model SAKNN of trajectory prediction in which we take into account the two intractable challenges socially aware interactive effects and dynamic uncertainties in robotics and autonomous driving. We embed the interaction layer and the Kalman layer into the architecture of SAKNN to exploit the two challenges respectively. In an extensive set of experiments, SAKNN outperforms in the effectiveness, the robustness and the smoothness than baseline models and achieve stateoftheart performance on US101 and I80 data sets. Further work will extend SAKNN to a probabilistic formulation and combine SAKNN with a maneuverbased model in which road topology and more of the traffic information will be taken into account as a priori.
Acknowledgement
We would like to thank Baidu Apollo community. Especially, we thank the L3 prediction lead Yizhi Xu for his support and some helpful suggestions and discussions in the techniques of the motion prediction.
References

[Alahi et al.2016]
Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; FeiFei, L.; and Savarese,
S.
2016.
Social lstm: Human trajectory prediction in crowded spaces.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 961–971.  [Bhattacharyya et al.2018] Bhattacharyya, R. P.; Phillips, D. J.; Wulfe, B.; Morton, J.; Kuefler, A.; and Kochenderfer, M. J. 2018. Multiagent imitation learning for driving simulation. arXiv preprint arXiv:1803.01044.
 [Bishop, Welch, and others2001] Bishop, G.; Welch, G.; et al. 2001. An introduction to the kalman filter. Proc of SIGGRAPH, Course 8(275993175):59.
 [Bobrowski et al.2008] Bobrowski, O.; Meir, R.; Shoham, S.; and Eldar, Y. 2008. A neural network implementing optimal state estimation based on dynamic spike train decoding. In Advances in Neural Information Processing Systems, 145–152.
 [Cho et al.2014] Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
 [Colyar and Halkias2007] Colyar, J., and Halkias, J. 2007. Us highway 101 dataset. Federal Highway Administration (FHWA), Tech. Rep. FHWAHRT07030.
 [Coskun et al.2017] Coskun, H.; Achilles, F.; DiPietro, R. S.; Navab, N.; and Tombari, F. 2017. Long shortterm memory kalman filters: Recurrent neural estimators for pose regularization. In ICCV, 5525–5533.
 [Deng, Yu, and others2014] Deng, L.; Yu, D.; et al. 2014. Deep learning: methods and applications. Foundations and Trends® in Signal Processing 7(3–4):197–387.
 [Deo and Trivedi2018] Deo, N., and Trivedi, M. M. 2018. Convolutional social pooling for vehicle trajectory prediction. arXiv preprint arXiv:1805.06771.
 [Graves et al.2009] Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; and Schmidhuber, J. 2009. A novel connectionist system for unconstrained handwriting recognition. IEEE transactions on pattern analysis and machine intelligence 31(5):855–868.

[Graves, Mohamed, and
Hinton2013]
Graves, A.; Mohamed, A.r.; and Hinton, G.
2013.
Speech recognition with deep recurrent neural networks.
In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, 6645–6649. IEEE.  [Haarnoja et al.2016] Haarnoja, T.; Ajay, A.; Levine, S.; and Abbeel, P. 2016. Backprop kf: Learning discriminative deterministic state estimators. In Advances in Neural Information Processing Systems, 4376–4384.
 [Helbing and Molnar1995] Helbing, D., and Molnar, P. 1995. Social force model for pedestrian dynamics. Physical review E 51(5):4282.
 [Helbing, Farkas, and Vicsek2000] Helbing, D.; Farkas, I.; and Vicsek, T. 2000. Simulating dynamical features of escape panic. Nature 407(6803):487.
 [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long shortterm memory. Neural computation 9(8):1735–1780.
 [Kalman1960] Kalman, R. E. 1960. A new approach to linear filtering and prediction problems. Journal of basic Engineering 82(1):35–45.
 [Krishnan, Shalit, and Sontag2015] Krishnan, R. G.; Shalit, U.; and Sontag, D. 2015. Deep kalman filters. arXiv preprint arXiv:1511.05121.
 [Kuefler et al.2017] Kuefler, A.; Morton, J.; Wheeler, T.; and Kochenderfer, M. 2017. Imitating driver behavior with generative adversarial networks. In Intelligent Vehicles Symposium (IV), 2017 IEEE, 204–211. IEEE.

[Langford, Salakhutdinov, and
Zhang2009]
Langford, J.; Salakhutdinov, R.; and Zhang, T.
2009.
Learning nonlinear dynamic models.
In
Proceedings of the 26th Annual International Conference on Machine Learning
, 593–600. ACM.  [Lefèvre, Vasquez, and Laugier2014] Lefèvre, S.; Vasquez, D.; and Laugier, C. 2014. A survey on motion prediction and risk assessment for intelligent vehicles. Robomech Journal 1(1):1.
 [Lu and Skabardonis2007] Lu, X.Y., and Skabardonis, A. 2007. Freeway traffic shockwave analysis: exploring the ngsim trajectory data. In 86th Annual Meeting of the Transportation Research Board, Washington, DC.
 [Park et al.2018] Park, S.; Kim, B.; Kang, C. M.; Chung, C. C.; and Choi, J. W. 2018. Sequencetosequence prediction of vehicle trajectory via lstm encoderdecoder architecture. arXiv preprint arXiv:1802.06338.
 [Simon2006] Simon, D. 2006. Optimal state estimation: Kalman, H infinity, and nonlinear approaches. John Wiley & Sons.
 [Thrun, Burgard, and Fox2005] Thrun, S.; Burgard, W.; and Fox, D. 2005. Probabilistic robotics. MIT press.

[Vemula, Muelling, and
Oh2017]
Vemula, A.; Muelling, K.; and Oh, J.
2017.
Social Attention: Modeling Attention in Human Crowds.
ArXiv eprints.  [Wilson and Finkel2009] Wilson, R., and Finkel, L. 2009. A neural implementation of the kalman filter. In Advances in neural information processing systems, 2062–2070.
Comments
There are no comments yet.