Teaching Vehicles to Anticipate: A Systematic Study on Probabilistic Behavior Prediction using Large Data Sets

by   Florian Wirthmüller, et al.

Observations of traffic participants and their environment enable humans to drive road vehicles safely. However, when being driven, there is a notable difference between having a non-experienced vs. an experienced driver. One may get the feeling, that the latter one anticipates what may happen in the next few moments and considers these foresights in his driving behavior. To make the driving style of automated vehicles comparable to a human driver in the sense of comfort and perceived safety, the aforementioned anticipation skills need to become a built-in feature of self-driving vehicles. This article provides a systematic comparison of methods and strategies to generate this intention for self-driving cars using machine learning techniques. To implement and test these algorithms we use a large data set collected over more than 30000 km of highway driving and containing approximately 40000 real world driving situations. Moreover, we show that it is possible to certainly detect more than 47 false positive rate of less than 1 position with a prediction horizon of 5 s with a median error of less than 0.21 m.


page 1

page 4

page 13

page 15


A Maneuver-based Urban Driving Dataset and Model for Cooperative Vehicle Applications

Short-term future of automated driving can be imagined as a hybrid scena...

Developing Robot Driver Etiquette Based on Naturalistic Human Driving Behavior

Automated vehicles can change the society by improved safety, mobility a...

Deep Reinforcement Learning for Human-Like Driving Policies in Collision Avoidance Tasks of Self-Driving Cars

The technological and scientific challenges involved in the development ...

Hybrid Eyes: Design and Evaluation of the Prediction-level Cooperative Driving with a Real-world Automated Driving System

Currently, there are still various situations in which automated driving...

Learning Accurate, Comfortable and Human-like Driving

Autonomous vehicles are more likely to be accepted if they drive accurat...

A-Eye: Driving with the Eyes of AI for Corner Case Generation

The overall goal of this work is to enrich training data for automated d...

Online Parameter Estimation for Human Driver Behavior Prediction

Driver models are invaluable for planning in autonomous vehicles as well...

I Introduction

Automated driving has the potential to radically change the way how individual mobility works and how goods are transported. To solve the problem of driving automation multiple processing steps have to be executed. Fig. 1 illustrates this thought: In a first step, the current traffic scene has to be sensed and a representation of the environment needs to be generated. Using this information the situation needs to be interpreted and the behavior of others has to be anticipated. Subsequently a plan, namely a trajectory can be derived based on this knowledge, which is executed in the last step of this process. How long this trajectory stays viable, before it has to be re-planed is thereby essentially influenced by the capability of the prediction component.

In contrast to other researchers working on techniques to interconnect vehicles through a so called car-to-car communication, we aim to solve this anticipation task locally. On one hand it is not foreseeable when an adequate market penetration of vehicles with such techniques will be reached. On the other, a local prediction component will always be necessary, as there will always remain traffic participants without communication skills as e. g. bicyclists. In addition, local predictions can be necessary to bypass transmission times in some cases as e. g. [24] points out. Moreover, it is reasonable to approach the topic from the perspective of highway driving, as this use case is easier to realize than others due to its clear constraints (e. g. structured setting, no pedestrians). Although this implies for the prediction task the challenge of creating precise long-term predictions (2 to 5 s) rather than short forecasts (up to 2 s), as in highway scenarios higher velocities than in urban or rural areas can be expected.

Fig. 1: Long-term driving behavior predictions in the context of trajectory planning for automated driving (equal symbols denote simultaneity).

I-a Problem Statement

We are tackling the problem of anticipating the behavior of other traffic participants in highway scenarios. Thereby, we aim to generate information which can be processed by trajectory planning algorithms to implement an anticipatory driving style. To do so, our objective is to predict a time dependent distribution of future vehicle positions rather than single shot trajectory predictions, as these distributions model spatial probability densities, which are more usefull for downstreamed criticality assessments. Despite the highway focus, our methods should be general enough to suite in other environments as well.

I-B Problem Resolution Strategy

To solve the introduced task, this article presents a systematic workflow for the design and evaluation of a lightweight maneuver-based model (cf. [15]), which uses standard sensor inputs to perform long-term driving behavior predictions. Methodically, we build up on [21] and use a two-step Mixture of Experts (MOE

) approach. This includes a maneuver classification and a downstreamed behavior prediction. The lane change probabilities produced by the classifier are used in the Mixture of Experts approach as gating nodes. Specifically, the probabilities control the portions with which the respective experts contribute to the predicted distribution of future vehicle positions. The experts themselves are modeled as Gaussian Mixture Models (

GMMs) in the combined input and output space, and are used in a Gaussian Mixture Regression manner. In addition, we introduce an alternative methodology to the Mixture of Experts approach, integrating the outputs of the gating nodes into one single model, and show its benefits.

Altogether, we contribute a systematic procedure for designing and evaluating the prediction models as well as methodical extensions to known approaches. Moreover, we show the performance of the developed modules for the two tasks of predicting driving maneuvers and probability distributions of future positions both separately and in combination. To evaluate the modules, we use a large data set of real world measurements. As will be shown, our prediction models outperform established state of the art approaches.

I-C Article Structure

The remainder of this article is structured as follows: Sec. II discusses related work on object motion prediction emphasizing the advantages of our approach. Sec. III introduces the data set and describes all preprocessing steps we applied to it. Subsequently, we outline the training of the investigated maneuver classifiers in Sec. IV, whereas Sec. V describes the experimental evaluation as well as the performance of the classifiers. Based on these findings, we develop (cf. Sec. VI) and asses (cf. Sec. VII

) different approaches for estimating probability distributions of future vehicle positions. Finally,

Sec. VIII summarizes our contribution and gives an outlook on future work.

Ii Related Work

Concerning the understanding and prediction of the behavior of other traffic participants in highway scenarios, several aspects have been investigated in literature. Accordingly, this section is sub-divided into three corresponding parts: Sec. II-A presents an overview of approaches infering which kind of maneuver will be executed by a vehicle. However, note that applications like collision checkers or trajectory planning algorithms cannot directly process such kind of information. Instead probabilities of future vehicle positions or trajectories need to be predicted. Related research on this topic is presented in Sec. II-B. Bringing together the aspects of maneuver classification and position prediction, Sec. II-C gives an overview of hybrid prediction approaches. The section closes (cf. Sec. II-D) with a brief literature summary leading to our contributions (cf. Sec. II-E).

Ii-a Classification Approaches

Classification approaches for maneuver recognition are described in [24, 30, 20, 2]. In [24], a system is introduced, which is capable of detecting lane changes with high accuracies (

99 %), approximately 1 s before their occurrence. For this purpose, dynamic Bayesian networks are used. Another approach, which is capable of detecting lane changes, approximately 1.5 s before their occurrence, is presented in

[30]. To achieve this the lane change probability is deconstructed into a situation- and a movement-based component, leading to an -score being better than 98 %. The approach presented in [20]

shows that it is possible to detect lane changes up to time horizons of 2 s when using feature selection for scene understanding, with an Area Under the Curve (

AUC) better than 0.96. Moreover, [2]

combines interaction-aware heuristic models with an interaction-unaware learned model. The interaction-aware component is based on a multi agent simulation based on game theory, in which each agent simultaneously tries to minimize different cost functions. These cost functions are designed using expert knowledge and by considering traffic rules. In a second step, the output of the interaction model is used to condition an interaction-unaware classifier based on Bayesian networks. The approach is able to detect lane changes on average 1.8 s in advance, with an

AUC better than 93 %.

Ii-B Trajectory and Position Prediction Approaches

Approaches dealing with the prediction of trajectories and positions are presented in [16, 1, 26, 25, 18]: [16] uses a fully-connected Deep Neural Net to learn the parameters of a two-dimensional GMM. For each situation, an adapted Gaussian Mixture distribution models the probability density in the output dimensions and (cf. Tab. XII). This distribution is then sampled to estimate trajectories. The authors evaluate their approach with the widely used NGSIM data set (see [4]) and show that an root weighted square error (comparable to RMSE) of approximately 0.5 m in lateral direction at a prediction horizon of 5 s can be achieved. Another approach, which is also evaluated using the NGSIM data set, is presented in [1]

. The authors propose the use of a Long Short Term Memory network for predicting trajectories. In particular, the approach is able to compute single shot predictions with a

RMSE of approximately 0.42 m at 5 s. [26]

deals with the prediction of spatial probability density functions, especially at road intersections. More precisely, a conditional probability density function, which models the relationship between past and future motions, is inferred from training data. Finally, standard

GMMs and variational approaches are compared. In [25], this approach is extended by using a hierarchical Mixture of Experts that allows incorporating categorical information. The latter includes, for example, the topology of an road intersection. In [18], a Gaussian Mixture Regression approach for predicting future longitudinal positions as well as a procedure for estimating the prediction confidence are introduced.

Fig. 2: Preprocessing steps used in our proposed workflow. The respective sections are referred in the boxes.

Ii-C Hybrid Approaches

Approaches that combine strategies both for maneuver detection and trajectory or position prediction, similar to the approach presented in this article, can be found in [32, 31, 28, 29, 6, 7]. In the following, we denote such approaches as hybrid. [32]

presents a two-staged approach: In a first step, a Multilayer Perceptron (

MLP) is used to estimate the future lane of a vehicle. In a second step, a concrete trajectory realization is estimated with an additional MLP. As a result, the lane estimation module is able to detect lane changes 2 s in advance with an AUC better than 0.9. The evaluation of the trajectory prediction module shows a median lateral error of approximately 0.23 m at a prediction horizon of 5 s. [31]

proposes another hybrid approach that uses the prediction of future trajectories to forecast lane change maneuvers. Moreover, the intention of drivers is modeled using a Support Vector Machine. Subsequently the resulting action is checked for collisions. This enables the approach to model interrupted lane changes. During the evaluation, a

-score of 98.1 % with a detection time up to 1.74 s is achieved. In turn, [28] does not follow a hybrid approach as described above, but contains an intermediate step before predicting trajectories. Instead of learning maneuver probabilities, [28]

presents a regression technique that relies on Random Forests for estimating the time span to the next lane change. In

[29], this approach is extended and combined with findings from [30]. The estimated time to the next lane changes to the left and to the right are used as input for a cubic polynom to predict future trajectories. Finally, the approach is evaluated with the already mentioned NGSIM data set, showing an median lateral error of approximately 0.5 m at a prediction horizon of 3 s for lane changing scenarios assuming a perfect maneuver classification. [6]

proposes the use of a Hidden Markov Model based maneuver recognition, which distinguishes between ten different maneuver classes. On this basis a position prediction module, which combines several maneuver specific variational

GMMs (according to [26]), and an Interacting Multiple Model, which weights different physical models against each other, are implemented. As the approach uses ten maneuver classes and the error has been only measured in terms of Euclidean distance, the results are difficult to compare with the ones of other approaches. Additionally, the approach is evaluated on a very small data set only. Finally, in [7] these findings are pursued by the use of a Long Short Term Memory network. The authors show some improvements compared to their previous work, while using the NGSIM data set for evaluation purposes. [21] presents an approach predicting future lateral vehicle positions based on Gaussian Mixture Regression and a Mixture of Experts with a Random Forest as gating network. The approach is evaluated based on a small data set, leading to noisy results, especially in case of lane changes. The evaluation shows that the approach is able to perform maneuver classifications with an AUC better than 84 % and lateral position predictions with an median error of less than 0.2 m at a prediction horizon of 5 s.

Ii-D Literature Summary

The findings of our literature survey can be summarized as follows: Many works provide meaningful algorithmic contributions. However, in numerous cases we miss structure regarding the problem resolution strategy. Often it does not become clear how the approaches compare to any baseline (e. g. [6]). Moreover, parameters (e. g. [31]) and feature sets (e. g. [1]) are selected manually and are thus difficult to retrace. In addition, most of the presented approaches focus on short or medium prediction horizons (e. g. [24]), or lack a good prediction performance for larger time-horizons (e. g. [29]). When analyzing the approaches that aim to resolve the long-term prediction problem, it becomes clear that the latter is challenging as the prediction models become significantly more complex as e. g. pointed out by [2, 20] and [12].

Moreover, many approaches (e. g. [1]) aim to predict single trajectories or single shot predictions rather than probabilistic distributions of future vehicle positions. Therefore, the objective to be optimized is mostly the root mean square error (). In contrast to these works, we consider the objective of the learning problem as generating an estimator which models a probability distribution of positions reflecting the frequencies of all positions that could be observed e. g. for different drivers in the same situation. Thus, we aim to maximize the likelihood of truly occupied positions given the model. The reasoning behind that decision is, that such distributions contain significantly more information than single shot predictions and are thus more useful for applications which have to consider risks as e. g. maneuver planning approaches (cf. [26, 19]).

Ii-E Contributions

Therefore, the contribution of this article is threefold:

  1. We apply a heuristic-free machine learning workflow to generate a model that is capable of predicting maneuvers and precise distributions of future vehicle positions for time horizons up to 5 s. This is achieved using a machine learning workflow that omits any human tuned (hyper-) parameters. Note that this includes all aspects involving feature engineering, labeling, feature selection, and hyperparameter optimization for different classification algorithms. Regarding the problem of feature selection, an automated process to determine the actual relevant features is used instead of manually pre-selecting the latter.

  2. We evaluate the modules for maneuver classification and position prediction, where both parts are not only evaluated separately, as in other works (e. g. [29]), but also as a combined prediction system. This concerns the lateral as well as the longitudinal behavior. In this context, we show that directly feeding the results of the classifier into the regression problem produces results comparable to a Mixture of Experts approach. Additionally, we show that relying on the Markov assumption and not modeling the interactions between the traffic participants explicitly, allows producing superior results, compared to existing approaches. As opposed to these works, we bring the different aspects of behavior prediction, which constitute in the prediction of driving maneuvers and positions both in lateral and longitudinal direction, together. In addition, we introduce new methodologies, and conduct a large-scale evaluation.

  3. We demonstrate that the presented methods not only have the potential to outperform state of the art approaches when being fed with a sufficient number of data. Additionaly, we show that our approach is able to provide a meaningful estimate of the prediction uncertainty to the consumer of the information, which is beneficial for collision risk calculation and trajectory planning (e. g. [19]).

Iii Data Preparation & Experimental Setup

First of all, Sec. III-A introduces the considered data set and the experimental setup. Afterwards, Sec. III-B gives a detailed overview of the features used to train our models. Finally, we steep into the labeling process (cf. Sec. III-C) and the data set split for training, validating and testing the constructed models as well as further preprocessing steps (Sec. III-D). Fig. 2 summarizes the overall preprocessing chain.

Iii-a Data Collection

For modelling and evaluating our modules, we use measurement data from a fleet of testing vehicles (cf. [22]

) equipped with common series sensors. The sensor setup includes a front-facing camera detecting lane markings, as well as two radars observing the traffic situation in the back. In addition, the vehicles have a front-facing automotive radar to sense the distances and velocities of the surrounding vehicles. The data has been collected with different vehicles and drivers at varying times of the day during all seasons. The campaign spanned more than a year and was mainly restricted to the area around Stuttgart in Germany. Through the wide variance, we are expecting our models to achieve good generalization characteristics.

Unlike other contributions (e. g. [21]), we are not using the real object-vehicles as prediction target in this work, but rather the ego- (or measurement-) vehicle itself. However, as our work focuses on the prediction of surrounding vehicles, we only use features that are also observable from an external point of view, as postulated in other works (e. g. [24] or [31]). For example, this constraint excludes features like driver status or steering wheel angle. Thus, the models remain applicable to real object-vehicles, assuming a good sensing of their surrounding. Working with the ego-vehicle data offers several advantages concerning the modeling of the situations: First, each situation can be described in a similar way as one does not have to care about neighboring vehicles to the target-vehicle, which are hidden for the measurement-vehicle. In addition, all measurements span longer time periods as the target-vehicle can never disappear from the field of view. This way of data handling is widespread in literature (e. g. [30]). In addition, one can expect, that future sensor setups will minimize measurement uncertainty for perceived objects and will get closer to the data quality that is nowadays available for the ego-vehicle.

Fig. 3: Environment model used for our investigations.

Basically, our investigations rely on a similar environment model than the one presented in [20], modeling the surrounding with a fixed grid of eight relation partners. Opposed to [20], we use the ego-vehicle as prediction target. Therefore, we introduce some slight adaptions to the environment model: As the sensors facing the rear traffic are less capable than the ones facing the front, our environment model (cf. Fig. 3) distinguish between relation partners behind (index ) and in front of (index ) the prediction target . Thus, the relation vectors of the rear objects are shortened compared to the ones for the front objects . The relation vectors describe the relation between the respective object and the prediction target. Moreover, object vehicles on the same lane as and driving behind are left out, as our current sensor setup is not able to sense them. Consequently, a traffic situation can be described by the feature vector , containing the relations of and its seven relation partners, its own status and the infrastructure description (cf. Eq. 1):


A detailed listing of the particular elements of the relation vectors and as well as and can be found in Tab. XII.

Iii-B Feature Engineering

To test and develop our system and fill the described environment model, we use fused data originating from three different sources:

  1. The basis for our investigations are measurement data from the described testing fleet.

  2. As we identified additional features being of interest as inputs, we fused the data with information from a navigation map (e. g. bridges, tunnels, and distances to highway approaches).

  3. Besides that we calculated some higher order features out of the measurements, e. g. a conversion to a curvilinear coordinate-system along the road (cf. [23]).

Iii-C Labeling

According to previous works (cf. [21]) we divide all samples into the three maneuver classes (lane change left), (lane following), and (lane change right). We apply a labeling process that works as follows: First, for each measurement we calculate the times to the next lane change to the left () and to the right () neighboring lane. This is accomplished by a forecast in time with the distances to the lane markings. As the moment of the lane change, we define the point in time, when the vehicle center has just crossed the lane marking. Subsequently, we determine the maneuver labels of each sample with a defined prediction horizon according to Eq. 2:


Initially, we decided to use a horizon of 5 s as the duration of lane change maneuvers mostly ranges between 3 s and 5 s ([31]). Accordingly, it seems reasonable to label samples only to an upper boundary of 5 s as potential lane change samples. Additionaly, this value is widely used in literature as longest prediction time (e. g. [2, 32] or [31]) and therefore it ensures comparability. However, note that this style of labeling can result in decreased performance values, as detections which are slightly more than 5 s ahead of a lane change count as false positives.

Iii-D Data Set Split

As shown in Fig. 2, we split our data into several parts after executing the aforementioned preprocessing steps. The first split divides our data into one part for the maneuver classification () and another one for the position prediction (). This allows us to produce bias free models. An overview on the splits as well as the respective data set sizes and identifiers is given in Tab. I.

Maneuver Data: Position Data:
Training & Validation: Test: Training: Test:
130 623 Trajectories 20 000
(7 s; variable sampling time) Trajectories
(5 s; 10 Hz)
Samples per Maneuver Class: Trajectories:
90 759 87 499 89 048 90 458 92 669 87 308 3 685 6 037 5 071
TABLE I: Data Set Identifiers and Sizes

The first part () is then used as follows: To prepare the training, parametrization and evaluation of the developed classifiers and to stay methodicaly straight, we split data set once more into six folds. Thereof we use five folds () in Sec. IV for design and parametrization. The remaining fold () is only used for the performance examinations presented in Sec. V. The split is performed based on entire situations as described in [21]. This means that the measurements of each situation solely occur in one of the folds. Note that this ensures the absence of unrealistic results due to similar samples from the same time series. To achieve an even proportion of the three maneuver classes, we balance the number of samples within each fold by a random undersampling strategy. As the prediction problem is an extremely unbalanced one, as also outlined in [1], classifiers would focus on the most frequent maneuver class () otherwise. For example, in our case, approximately 94 % of the data points belong to that class.

In addition, we only take situations into account, which have been collected continuously up to the prediction horizon of 5 s. This ensures that the folds are also balanced over time, which constitutes a prerequisite for performing fair evaluations. Obviously, the prediction task is much more demanding when predicting a lane change 4 s in advance instead of 1 s in advance. Through this strategy, the number of samples in the six folds can be slightly different, but we consider this as uncritical. Overall, this data set () contains approximately 8 hours of highway driving of which 2/3 are collected right during lane changes.

The second data set (), which is intended for the training and evaluation of the position prediction, is processed as follows: Initially, we add the lane change probabilities as estimated by the different classifiers to each sample. In addition, we only select measurements that have been collected while the vehicle has been manually driven. Note that this becomes essential as all vehicles in our testing fleet are equipped with an Adaptive Cruise Control (ACC) system. Thus, driving in a semi-automated mode is overrepresented in our data compared to reality.111We did not explicitly filter out ACC driving in the data set for the maneuver classification, as we assume, that the ACC is always deactivated during lane changes. Subsequently, we split the data set into a subset () for training and another () for evaluating the position predictions, as described in Sec. VI and Sec. VII. Afterwards, we expand each data point in with the desired prediction outputs, the true positions in x and y direction, for all times within {-1.0 s, -0.9 s, …, 6.0 s}. The samples with negative times and the ones with times 5 s are needed to train the distributions correctly. Note that, strictly limiting the times to a certain range would generate areas in the data space, which are difficult to represent with GMMs due to discontinuities similar to the ones in the probability dimension (cf. Sec. VI-B

). To overcome these problems, we integrated a mechanism performing a subsampling between -1 s and 0 s and between 5 s and 6 s according to a Gaussian distribution. In addition, another mechanism performing a time interpolation ensures that the training data points are distributed continuously along the time dimension. This means, that we also have access to prediction times in between our sampling times during the training process. Moreover, the data points in the position test data set (

) are expanded with x and y positions as well as corresponding times within {0.0 s, 0.1 s, …, 5.0 s}. Afterwards, we ’coil’ the two data sets ( & ) such that each of the newly constructed data points contains the features at the start point of the prediction, one prediction time, and the true x and y positions at that point of time (in Fig. 2 this step is called ’Explode Data’). Hence, our data sets are multiplied by a factor of 71 respectively 51 and are structured as described in detail in Sec. VII-A. Note that is re-splitted along the maneuver labels in Sec. VI-A, to train maneuver specific position prediction experts.

Iv Maneuver Classifier Training

This section gives an overview of the different techniques used for feature selection (cf. Sec. IV-A), classification algorithms (cf. Sec. IV-B) and techniques to tune the respective hyperparameters (cf. Sec. IV-C) for the maneuver classification. The corresponding activities are illustrated in Fig. 4.

Fig. 4: Process of training and evaluating maneuver classifiers.

Iv-a Feature Selection

This section outlines our considerations concerning the task of selecting a meaningfull subset of features from the available superset. Hence, selecting features makes sense for two reasons: First, it can improve the prediction performance of the maneuver classifiers. Second, it can help to reduce calculation efforts for making predictions on devices with limited computational power. Our main goal here is to improve the overall prediction performance. This slightly contrasts with an overall ranking of the available features, as some of the features are highly redundant. Therefore, the most predictive variables should be selected, while excluding redundant ones. In literature, one can find a large number of publications concerning feature selection in machine learning applications. In our implementation, we rely on the findings from [9]. As we claim to solve the underlying problem through a systematic machine learning workflow, we start with simple techniques and then move towards more sophisticated and computationally expensive ones. To demonstrate the performance of the used techniques, additionally, we test the classification with the entire superset as a baseline. The superset containing all features is called in the following.

The first technique we chose is a simple correlation based feature selection technique, which evaluates the correlation of all features and then applies a threshold (set to 0.15) to remove features showing a very low correlation with the maneuver class from the superset. More precisely, we compute Spearman’s Correlation (see [8, p. 133 ff]) between each feature and the time to the next lane change (). We selected this quantity instead of the maneuver label, as it enables a smooth fade-out. The resulting set of features is called in the following. Tab. II summarizes the examined variants and their abbreviations. Additionally, the resulting elements of the respective feature sets can be found in Tab. XII.

Variant Description
Superset as Baseline
Correlation Threshold
Wrapper Technique
TABLE II: Summary of Examined Feature Selection Techniques

The second technique uses the Correlation-based Feature Selection (CFS; cf. [10]) and is referred to as in the following. For this technique, the correlation of entire feature sets instead of single features is calculated. More precisely, for all feature sets , the ’merit’ , as a measure of the predictive performance, is computed according to Eq. 3:


where describes the number of features and corresponds to the mean correlation of all features with the class label or in our case . Variable describes the mean feature-feature intercorrelation of all features within . As can be seen from Eq. 3, strongly correlated features in a feature set will minimize , whereas a stronger correlation with the class label will maximize the value of . All these computations rely on the assumption that there are no strong feature interactions present in the data set, but that instead every relevant feature itself is at least weakly correlated with the class label (see also [10]). To meet the conditions of our data set and to be consistent with variant , we use Spearman’s correlation coefficient. As the computation of is not feasible for all possible feature combinations, we used a backward selection strategy that, according to Guyon ([9]), typically provides superior results compared to forward selection. When applying it in our research, we tried to minimize the possible shortcomings of the CFS by applying cross-validation with the five data folds for training and validation (), as described in Sec. III-D.

The so far described feature selection techniques are limited in two aspects: Firstly, a proper incorporation of the properties of the used classification algorithm is missing. Secondly, features only being meaningful in combination with others are not considered in feature sets and . Therefore, for generating feature set , we apply a wrapper feature selection technique as described in [13]. As the training of Random Forests already includes an implicit feature selection, we solely focus on wrapper techniques including the other classifiers presented in Sec. IV-B. The main idea of wrapper techniques is to incorporate the classifier itself as black box into the feature selection process. Within the process the prediction performance on a validation data set is used to determine the best feature set for the respective classifier. We build our investigations on a hyperparameter set that was optimized as described in Sec. IV-C, where the feature set of variant was used for optimization. According to the process for deriving , we perform the search for the most descriptive feature set with backward elimination. As for each of the approximately 5 000 possible subsets a classifier needs to be trained and evaluated, the wrapper technique is computationally expensive. To accelerate the computation, we are not performing the validation using cross-validation. Instead, we use one of the data folds (cf. Sec. III-D) for training () and one for validation ().

Iv-B Examined Classification Algorithms

For the task of maneuver classification, we selected three different algorithms, which have been successfully applied in reference works, for evaluation:

  1. The first algorithm is a Gaussian Naïve Bayes (GNB) approach using GMMs instead of only one Gaussian kernel per class (cf. [20]).

  2. The second algorithm is based on a Random Forest (RF) [21]).

  3. The third algorithm is a Multilayer Perceptron (MLP) approach (cf. [32]). As opposed to RF and GNB, we use scaled features, as suggested by [11, p. 398 ff]. In contrast to the original publication, we are using a modified labeling and a partly automated strategy to identify the right model structure, where we restrict the model to one hidden layer, in order to keep the parameter optimization solvable in finite time.

Iv-C Hyperparameter Optimization

To achieve the best possible performance and to enable a fair comparison of the examined classifiers, we optimize their respective hyperparameters. For the GNB, this means to find the optimal number of Gaussian kernels used for each feature and class. A Variational Bayesian Gaussian Mixture Model (VBGMM; see [5]) was used in this context, where this technique was already successfully applied by [26]. The principle behind VBGMMs is to fit a distribution of the possible Gaussian Mixture distributions using a Dirichlet process. Using this technique, the optimal value for is determined automatically.

Regarding RF and MLP approaches, the parameter optimization was executed for each feature set using a grid-search. This means, that we vary the parameters and calculate for each parameter set a performance value. For the latter, we calculate the average balanced accuracy (see Sec. V-A) in a leave one out cross-validation manner. Thereby, we use the data of the five data folds for training and validation (). The parameters to be optimized are summarized in Tab. III

25pt Classifier Parameter Description MLP Step size: controls how fast the weights of the network are adapted towards the direction of the gradient

Hidden neurons

: describes the structure of the network as we are only working with one hidden layer Iterations: maximum number of training cycles RF Trees: number of parallel trees in the forest Splits: maximum number of splits in each tree Samples: minimum number of samples necessary for a split

TABLE III: Optimized Hyperparameters per Classifier

So far, we constructed different feature sets (cf. Sec. IV-A) and optimized the hyperparameters for the different classification algorithms (cf. Sec. IV-B & Sec. IV-C). Subsequently, we execute a second training step with a larger amount of data for all algorithms, using the optimized feature sets and hyperparameters. The enlargement of the data set is achieved by using all five folds that we previously used in the cross-validation (). Through this step we derive the final models for our further evaluation (cf. Sec. V).

V Maneuver Classifier Evaluation

This section presents the experimental results we obtained with the trained classification models (cf. Sec. IV). Sec. V-A introduces the used performance measures, whereas Sec. V-B presents and discusses the results measured with the constructed test data set (cf. Sec. III-A).

V-a Performance Measures

To be able to assess the performance of the developed classifiers, several metrics are needed, as we are focusing on different objectives. Particularly, we are interested in predicting lane changes not only with high accuracies, but also as long as possible in advance of their execution.

To reflect that, we use the balanced accuracy (), which enables us to perform an even weighting between the classification performance for the three maneuver classes. Basically, we use the definition from [3], but in a generalized form for multiclass problems (cf. Eq. 4):


where describes the set of maneuver classes. corresponds to the number of true positives for class and to the number of samples truly belonging to class (positives). Thereby, the classifiers assign each sample to the class with the highest probability value.

In addition, we use the Receiver Operator Characteristic (ROC) and Area Under the ROC Curve (AUC), which are widely used metrics in this domain (e. g. [17, p. 180 ff]). As opposed to the , the ROC curve is originally intended to asses binary classifiers. Therefore, we transform our three-class problem into three binary classification problems. As opposed to the the so constructed ROC curves enable us to show off the classification quality at different working points (WP). For example, this property allows us to assess the classification performance for the maneuver classes and with more conservative classifier parametrizations and thus less false positives. The also used AUC helps to analyze the performance at all possible working points at once.

Additionally, we define metrics that enable us to analyze the technically possible prediction time horizon. As the point in time being referenced hereby is essential and most sources (e. g. [24], [32] and [31]) are not very exact in that respect, we introduce the two metrics and as shown in Tab. IV:

40pt Metric Definition time between the vehicle center crosses the centerline and the first detection of the correct maneuver class as presented in [30] time between the vehicle center crosses the centerline and the moment from that the classifier is certain about its decision for a specific maneuver class and does not change it till the end of the situation, which is a considerable stricter definition than that for

TABLE IV: Definition of the Detection Time Metrics

In contrast to the evaluation, for which an unambiguous class assignment is necessary, the class assignment is conducted in a way that matches the binary evaluation in the ROC curve: For the classes and , respectively, we select a binary decision threshold that holds the false positive rate below 1%. The resulting working points are presented later on in Fig. 5 along with the curves. Therefore, the calculated detection times reflect an evaluation with a limited false positive rate, and, hence at a similar working point for the different classifiers. This ensures a fair evaluation. The reason behind the decision for a very low false positive rate is that the system should not produce too much lane change detections, as in practice lane changes occur very rarely compared to lane following.

V-B Results & Discussion

Tab. V shows the results (, , ) for the different classifiers and feature sets measured with the maneuver test data set (). Probably due to the large number of samples, a favorable classifier parametrization and selection seems to have a significantly higher impact on the classification performance as a clever feature selection has. This can be concluded, as the classifiers working with feature sets and also perform only slightly worse regarding and than the other classifiers. However, applying a feature selection still remains reasonable as it ensures shorter computation times. In addition, the results indicate that the feature selection contributes to increase prediction times in most cases. Note that these statements are not applicable to the RF as this classifier performs an implicit feature selection.

Classi- Feature Performance on Test Data
fier Set per Class (AUC; ; )
GNB 0.924 0.815 0.905
0.704 2.861.46 s - 2.921.42 s
2.181.26 s - 2.211.21 s
0.910 0.801 0.895
0.692 2.821.38 s - 2.821.32 s
2.111.13 s - 2.091.06 s
0.874 0.770 0.884
0.651 2.571.31 s - 2.731.31 s
2.021.11 s - 2.021.07 s
0.943 0.864 0.929
0.772 3.261.28 s - 3.111.14 s
2.611.13 s - 2.690.94 s
MLP 0.973 0.909 0.961
0.823 3.671.26 s - 3.341.18 s
2.951.25 s - 2.820.97 s
0.974 0.912 0.959
0.831 3.741.07 s - 3.601.16 s
3.081.04 s - 2.861.06 s
0.966 0.891 0.953
0.798 3.441.07 s - 3.461.11 s
2.860.91 s - 2.820.89 s
0.976 0.915 0.960
0.831 3.781.16 s - 3.351.18 s
3.081.10 s - 2.660.99 s
RF 0.978 0.925 0.968
0.838 3.811.14 s - 3.601.19 s
3.311.13 s - 3.131.08 s
0.976 0.918 0.959
0.834 3.731.13 s - 3.611.17 s
3.281.10 s - 3.081.00 s
0.964 0.893 0.953
0.799 3.451.07 s - 3.491.10 s
2.950.87 s - 2.950.90 s
TABLE V: Summary of Examined Classifiers with Preferred Hyperparameters
Fig. 5: ROC curves for the developed maneuver classifiers with their respective best parameter sets and hyperparameters.

Fig. 5 additionally shows the ROC curves for the respective best combination of classifier and feature set regarding and for each of the three classifiers. As another finding of our investigations, the classification performance for the lane following maneuver (), which most researchers in literature neglect, is notable worse than for the lane changing maneuvers for all considered algorithms. This can be explained, as nearly each sample, which can not be certainly assigned to one of the lane change maneuvers, is classified as lane following, as confusions between a lane change to the right and one to the left are very rare. Thus, a significantly larger number of false positives arises for maneuver class . In addition, we could reproduce the findings of [2], which showed that lane changes to the left are easier to predict than the ones to the right. One may explain this with the observation that lane changes to the right are often motivated by the intention to leave the highway. The latter can be hardly predicted compared to lane changes to the left, which are often performed to overtake slower leading vehicles. Besides, it can be observed that the classification problem remains resolvable even with a significantly decreased number of features, as the MLP with feature set , which only includes 24 features, shows. This illustrates that decreasing the amount of features sometimes leads to an improved performance as the input space has a lower dimension. This can be explained, as a large number of features, which we expected to provide insights in specific lane changing situations, seem to have nearly no effect when observing the general behavior in highway situations. Exemplary features showing that behavior are summarized in Tab. VI.

Features Providing Insights on
fog lamps, wiper, … weather conditions
tunnel, bridge, … structural characteristics
lane mark color, … road works
country, distance to next geographic specialties
highway exit/approach, …
TABLE VI: Contextual Features Solely Impacting Special Situations

An explanation for this behavior could be that situations, which are affected by these features occur even rarer than lane changes. However, as it is extremely demanding exactly in these situations to drive automated, further investigations are needed in these cases (see Sec. VIII).

In addition, we want to remark that the detection times and are limited to a maximum of 5 s through our evaluation methodology. Therefore, the average values and presented in Tab. V could even be exceeded in practice. To substantiate this assumption, Fig. 6 shows a histogram of the detection times for the . The distribution shows a large number of situations, that can be detected 5 or more seconds in advance.

Fig. 6: Histogram of detection times (a) and (b) for RF for maneuver class with feature set (frequencies at and are explicitly annotated).

Altogether, our investigations show that a systematic machine learning workflow, combined with a large amount of data, is able to outperform current state of the art approaches significantly. This is obvious when looking at the AUC in comparison to other approaches, which is listed in Tab. VII. Tab. VII shows that our approach outperforms the others, although we are working with a clearly larger prediction horizon, which makes the classification problem more demanding as aforementioned. Note that the state of the art approaches were designed and evaluated on considerably smaller data sets.

Approach AUC Prediction Horizon
[21] 0.863 0.661 0.836 5.0 s
[20] 0.970 - 0.991 2.0 s
[2] 0.947 - 0.942 2.5 s
[29] 0.934 - 0.993 2.0 s
MLP 0.976 0.915 0.960 5.0 s
RF 0.978 0.925 0.968 5.0 s
TABLE VII: AUC Values in Comparison to Reference Works

Our investigations show that the GNB classifier performs significantly worse than the two other approaches. Thus, we only rely on these two classifiers for our further studies. Additionally, in the following sections we are restricting ourselves to the feature sets and hyperparameter sets showing the best performance (cf. Tab. VIII).

Classifier Parameter Value
MLP Feature Set
RF Feature Set
TABLE VIII: Selected Feature Sets and Hyperparameters per Classifier

Vi Position Predictor Training

This section describes how we train the investigated models for position prediction. Sec. VI-A relies on the Mixture of Experts (MOE) approach presented in [21] for lateral predictions using Gaussian Mixture Regression. Sec. VI-B presents an alternative approach. As opposed to the MOE approach it solves the problem in one processing step. The entire procedure, including the evaluation process (cf. Sec. VII), is depicted in Fig. 7.

Fig. 7: Steps to train and evaluate the position predictors.

Vi-a Mixture of Experts Approach (MOE)

Fig. 8: Illustration of the Mixture of Experts (MOE) approach.

To train the experts for the three maneuver classes, we divide the data set (cf. Sec. III-D) along the maneuver labels (cf. Fig. 7). Subsequently, we perform a random undersampling of the data points for the maneuver class to obtain approximately the same number of samples as for the other two classes. The idea behind that step is that the regression problem for the class is less complex than for the two other classes, and should therefore be solvable with the same amount of data. Amongst others, this data reduction helps to speed up training. As a consequence, the number of samples is approximately decreased by 95 % and the data sets , and are constructed (cf. Tab. I). Afterwards, we train an expert GMM with each of these data sets. These experts are later used in the MOE approach (cf. Fig. 8). We choose a maximum number of 50 mixture components, full covariance matrices222Preliminary investigations showed, that GMMs with diagonal covariance matrices are faster to fit but are clearly less accurate., and fitted the GMM in a variational manner again. Besides, we used the following input-feature set and the true position at a defined prediction time to train the experts in lateral direction (cf. Eq. 5):


Regarding the prediction in longitudinal direction, we need to distinguish whether or not a preceding vehicle is present. If no vehicle is in sensor range, both the relative speed and distance for that vehicle are set to default values. As involving the latter in the training of the models would lead to bad fits, the input feature sets and are defined as follows (cf. Eq. 6 & Eq. 7):


As shown in [18], the prediction performance for the longitudinal direction can be significantly increased by learning the deviation from the constant velocity prediction instead of the true target position . Consequently, we use the output dimensions (cf. Eq. 8):


Vi-B Integrated Approach

Fig. 9: Illustration of the integrated approach.

In addition to the MOE approach, we developed an integrated approach, which uses the unsplitted data set (cf. Tab. I) and expands the feature sets () with the maneuver probabilities and (cf. Fig. 9). is left out here as this information would be redundant to the two others and we want to keep the models’ dimension as low as possible. Consequently, the task of considering the maneuver probabilities is directly integrated in the model. On one hand, this has the positive effect that during the actual utilization only one model has to be asked to calculate the conditional distribution in the output space. On the other, the model is enabled to overcome potential weaknesses of the underlying classifier, as it can perform an implicit probability calibration during the fitting of the mixture, without needing an explicit calibration step. In this context, we discovered that GMMs are not well suited to fit probabilities bounded to values between 0 and 1. Especially this is the case, if most of the probabilities tend against the extreme values (cf. Fig. 10 (a)). Hence, we expanded our data set with a duplicate of each data point containing probability values, which are mirrored at 0 for original probabilities lower than 0.5 and at 1 for all other original probabilities. This way, we were able to generate the density shown in Fig. 10 (b), which we identified as easier to fit with GMMs. Note that before our adjustment, the density contained an abrupt jump, especially at =0. As these discontinuities are only representable by a large number of Gaussian components, which are per definition symmetrical and smooth, many components needed in other areas of the data space would be wasted for this purpose.

Fig. 10: Density of before (a) and after (b) adjustment.

The training of the integrated GMM is performed similarly to the experts training in a variational fashion, with 50 components and full covariance matrices, but with the entire training data set without any undersampling procedures, to preserve the unbalanced nature of the maneuver classes and its actual frequencies.

Vii Position Estimation Evaluation

For evaluating the position predictions, first of all, one has to decide which of the developed classifiers suits best as gating network in the Mixture of Experts (MOE) and in the integrated approach respectively. Hence, we calculate the average log-likelihoods on the entire position test data set (cf. Sec. III-D). Note that this data set is not balanced according to the maneuver labels, as e. g. suggested in [7]. Particularly the unbalanced nature of the data allows us to draw general conclusions about the performance, independent from the driving maneuver. Using the average log-likelihood as quality criterion for comparing different approaches is beneficial as it rates the quality of the predicted probability density distribution instead of assessing only the ability to predict one single position with a maximized accuracy. In addition, the log-likelihood is exactly the value to be maximized in the process of fitting a GMM. However, as cannot be interpreted as physical quantity, it can only be used for comparison purposes. As we are also interested in the performance concerning the spatial error and to achieve comparability, we also investigate this for the approach working best in the following subsections.

Tab. IX shows the per sample log likelihood for different approaches for the lateral () and longitudinal () directions. In this context, we use the already introduced classifiers RF and MLP, in combination with four different strategies for combining the experts’ position estimates:

  1. Raw probabilities (Raw): This strategy directly uses the raw probabilities as issued by the classifiers as gating probabilities. This means that we concatenate the three GMMs and multiply the mixture weights with the probabilities issued by the respective classifier.

  2. Winner Takes it All (WTA): This strategy uses the outputs of the GMM for the maneuver class with the largest probability according to the respective classifier.

  3. Prior Weighted Raw probabilities (PW-Raw): This strategy takes into account that the classifiers have been trained on a balanced data set. Therefore, it multiplies the raw probabilities with the prior probabilities for each maneuver class.

  4. Integrated GMM (I-GMM): This strategy directly uses the integrated approach presented in the previous section to predict the probability distributions.

To demonstrate the benefits of our approach, combining maneuver classification and position prediction, we additionally analyzed its performance compared to reference strategies. First, we use the labels as a perfect classifier. Moreover, we use the pure prior probabilities () as most naive classifier and a strategy without a classifier, referred to as NOCLF.

Classifier Strategy
(normalized [%]) (normalized [%])
Labels - -7.547 (100) -14.066 (100)
Priors - -7.769 (97.1) -13.273 (106.0)
NOCLF - -7.762 (97.2) -13.171 (106.8)
MLP Raw -7.900 (95.5) -13.667 (102.9)
WTA -8.793 (85.8) -16.279 (86.4)
PW-Raw -7.608 (99.2) -13.329 (105.5)
I-GMM -7.691 (98.1) -13.354 (105.3)
RF Raw -7.781 (95.9) -13.568 (103.7)
WTA -8.369 (90.2) -15.685 (89.7)
PW-Raw -7.626 (99.0) -13.28 (105.9)
I-GMM -7.611 (99.2) -13.207 (106.5)
TABLE IX: Per Sample Log-Likelihoods with Different Classifiers and MOE strategies

For the longitudinal direction, Tab. IX shows that the refrence solution without any previous maneuver classification (NOCLF) is able to produce slightly better results than the other combinations. Although it seems to be trivial that lane changes have not to be taken into account when predicting the longitudinal behavior, this is noteworthy, as our expectations beforehand was that lane changes to the left mostly go along with an acceleration, whereas braking actions are extremely rare.

By contrast, the benefits of the Mixture of Experts (MOE) approach come into effect for the lateral direction. As shown in Tab. IX, the combination of prior weighting and the MLP probabilities performs best. Furthermore, the combinations using the integrated approach perform only slightly worse or even better (RF) than the combinations using prior weighted probabilities. As benefit, these models are more robust against poor or uncalibrated maneuver probabilities without needing an additional calibration step. This is due to the fact that they perform an implicit probability calibration during the training of the GMM.

Besides, we found out that the WTA strategy has no practical relevance, as it does not necessarily produce smooth position predictions over consecutive time steps as the other strategies do per definition. Besides, in case of a misclassification the WTA strategy solely asks one expert model, which is possibly not applicable in that area of the data space, what clearly decreases the overall performance.

In the following, we continue with investigating the spatial errors for the best combinations (lateral: MLP classifier with PW-Raw strategy, longitudinal: NOCLF), as previously introduced. For this purpose, we present the applied performance measures in Sec. VII-A and then show the results in Sec. VII-B.

Vii-a Performance Measures

Fig. 11: Visualization of the error distribution (left) in lateral and longitudinal direction and the median lateral error as function of the prediction time (right).

To measure the spatial performance of our predictions, we rely on the unbalanced position evaluation data set . The latter contains the needed inputs for the maneuver classifiers and position predictors () as well as the true trajectories according to Eq. 9.


contains 5 s-trajectories sampled with 10 Hz (hence 1 000 000 samples) according to Eq. 10:


where each trajectory consists of 51 corresponding and positions, according to Eq. 11:


The predicted trajectories are then calculated with the described classifiers and position predictors in the same format as . However, as the Gaussian Mixture Regression originally produces probability densities instead of point estimates, these have to be calculated first. This is accomplished by calculating the center of gravity of the density as described in [21]. Hence, the prediction error of a specific prediction time for one of the trajectories is calculated as follows (Eq. 12):


These individual errors of all trajectories are concatenated to according to Eq. 13:


At this point, we want to reemphasize, that although that way of performance evaluation produces easy to interpret results, it disregards that our original outputs, i. e. spatial probability densities, contain much more information than a single point estimation.

Vii-B Results

Fig. 11 shows the performance of our selected combinations of classifiers and mixing strategies (highlighted in Tab. IX) at a prediction horizon of 5 s compared to a constant velocity (CV) prediction and a Mixture of Experts (MOE) with labels333Using the MOE with the labels as input corresponds to the assumption of a perfect classifier., for the lateral () and the longitudinal () direction on the left side. The right hand side of Fig. 11 shows the development of the median lateral error as function of the prediction time .

As the plots indicate, our position prediction system is able to produce in lateral as well as in longitudinal direction results comparable to the ones with a perfect maneuver classification. Additionally, they show that we are able to clearly outperform simple models as CV and reach a very small median lateral prediction error of less than 0.21 m at a prediction horizon of 5 s. As Tab. X shows, this is noteable compared to other approaches. Note that we did not include studies in this compilation, which report the root mean square error (RMSE) that we quantify with a value of 0.62 m. Thus, we follow on the one hand [27], who points out, that RMSE measures are not well suited to be compared between different data sets, as it depends on the size of the data set. On the other hand, we emphasize that the problem we are tackling (cf. Sec. I-A) is to predict the probability distribution of future vehicle positions rather than single shot estimates. Thus, we did not optimize our predictions to minimize . Therefore, it is not surprising that other researchers who explicitly minimize that value and ignore distribution estimations show better performance concerning the .

Approach [m] Prediction Horizon [s]
[21] 0.18 5.0
[32] 0.23 5.0
[29] 0.50 3.0
MLP (PW-Raw) 0.21 5.0
TABLE X: Lateral Prediction Performance in Comparison to Related Work

As already shown in [21], these results are dominated by the most frequent maneuver class (). Hence, we complementarily show the errors for 20 000 maneuvers of each type in Tab. XI.

[m] [m] [m]
3.22 1.67 2.20
1.25 0.19 1.80
TABLE XI: Prediction Errors per Class and Direction

As can be seen, the errors for the lane change maneuvers are considerably larger than the ones for lane-following. On one hand this can be explained with a more complex regression task. On the other, the predictions are subjected to higher uncertainties in case of a lane change, as shown by the predicted distributions (cf. Fig. 12). Contrarily to that, the uncertainty information is ignored in the single point estimates. Note that the increased uncertainties are caused by not knowing the exact point in time at which the maneuver will be completed, even if the classifier made the position prediction to know about an upcoming lane change.

Fig. 12: Predicted probability distribution of future vehicle positions for a single situation.

Complementary to these quantitative evaluations, we performed qualitative testing and visualized single situations along with our predictions. Therefore, we attached a short video and present a single frame in Fig. 12, which illustrates the predictions along with outlined uncertainties during an upcoming lane change. In addition, we show the confidence of our predictions (, ), which is an important hint concerning the reliability for the consumer of the information. This value is calculated similarly to [18] through additional GMM

s fitted in the input dimensions. To demonstrate its general usability, we visualized the confidence value divided by the standard deviation against the lateral prediction errors at

=5 s in Fig. 13. Thus, it can be seen that the prediction errors decrease with increasing confidence values, as expected.

Fig. 13: Prediction confidence against lateral prediction errors.
Description Unit (Continuous) Element of
Range of Values (Nominal)
general information describing the related vehicle (cf. Eq. 1 & Fig. 3)
activity status {0: inactive, 1: active} {f, l}44footnotemark: 4 {f, fl, fr, l} {fr, r} {fl, fr, l, r}
movement status {0: standing, 1: moving} {f, l} {f, fl, fr, l} {r} {fl, fr, r}
object class {0: bicycle, 1: motorbike, {f, l} {f, fl, l} {fl, fr, r}
2: car, …, 14: no class}
cut-in level {0: , 1: , {l} {r}
2: , 3: }
relation between and related vehicle in ’s coordinate-system
longitudinal distance {f, l} {f, l} {f, r} {fr, r}
lateral distance {f, l} {f, fr} {r} {fr, r}
relative longitudinal speed {f, r} {r} {f, fr, r}
relative lateral speed {f, fl, l, r} {f} {f, fr, r} {f, fr}
relative longitudinal acceleration {fr} {f, fl, fr, r}
relation between and related vehicle in curvilinear coordinates
longitudinal distance {f, l} {l} {fr} {fl, fr, r}
lateral distance {f, l} {fr} {r} {fr, r}
relative longitudinal speed {f, r} {f} {f, fl, fr, r}
relative lateral speed {f, fl, l, r} {l} {fr, r}
movement status {0: standing, 1: moving} {rr} {rr} {rr} {rr}
lateral distance {rl} {rl} {rl, rr}
status of the front fog lamp {0: off, 1: on}
status of the rear fog lamp
status of the rear left fog lamp
status of the rear right fog lamp
wiper level {0, …, 15}
distance between the center of and the left marking
distance between the center of and the right marking
distance between the center of and
the centerline of the assigned lane
longitudinal speed of the observed vehicle
lateral speed of the observed vehicle
longitudinal acceleration of the observed vehicle
lateral acceleration of the observed vehicle
angle of the observed vehicle
relative to the direction of the lane
type of the left marking {0: no marking, 1: continuous,
type of the right marking 2: broken}
color of the left marking {0: no marking, 1: white,
color of the right marking 2: yellow}
number of parallel lanes observed via the camera {0: 0, …, 3: 3+}
number of lanes stored in the map {0, …, 5}
country {0: GER, 1: US, …}
indicator if the situation takes place in a tunnel {0: False, 1: True}
indicator if the situation takes place on a bridge
speedlimit of the current highway section {1: , …, 8: }
type of next approach {0: unknown, 1: on ramp,
to the highway 2: highway merge}
type of next exit {0: unknown, 1: ramp,
of the highway 2: highway divider}
width of the left marking
width of the right marking
width of the lane
distance to the next approach to the highway
distance to the next exit of the highway
curvature of the road
derrivation of the curvature
44footnotemark: 4 This means for example, that feature set (introduced in Sec. IV-A), contains in total 40 elements, including the activity status of its surrounding vehicles in front (f)
and to its left (l). In contrast to that, the activity states of its other front relation partners (r, fl, fr) are not included in .
TABLE XII: Description of the Evaluated Features for an Observed Vehicle and Usage of the Features in the Constructed Feature Sets (-)

Viii Conclusions and Outlook

In this work, we presented a machine learning workflow for predicting the behavior of surrounding vehicles in highway scenarios. For the first time, a combined compilation of prediction techniques for driving maneuvers and positions as well as lateral and longitudinal behavior is presented. The developed modules are evaluated in detail with a large amount of real world data, challenging established state of the art approaches.

To further improve the quality of the presented behavior predictions, especially in complex situations, we are working on various additional studies and enhancements. In this context, we migrate the prediction strategies on an experimental vehicle to enable further investigations concerning run time and resource usage. Meanwhile, we are about to transfer our models to predict movements of surrounding vehicles in contrast to the ego-vehicles movements. Besides, we plan to adapt our work on a publicly available data set as highD (cf. [14]) or NGSIM to improve comparability.

Moreover, we see high potential in identifying demanding scenarios and explicitly integrating contextual knowledge (e. g. weather, traffic, time of day, local specialties) into our models. First experiments prove, that these contextual properties can have a considerable impact on driving behavior.


The authors would like to thank Daimler AG Research and Development for providing the measurement data, enabling us to perform our experiments. In addition, we would like to thank our colleagues at the Institute of Databases and Information Systems at Ulm University and especially Prof. Dr. Klaus-Dieter Kuhnert from the Institute of Realtime Learning Systems at the University of Siegen for supporting our studies.


  • [1] F. Altché and A. De La Fortelle (2017) An LSTM network for highway trajectory prediction. In 2017 IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 353–359. Cited by: §II-B, §II-D, §II-D, §III-D.
  • [2] M. Bahram, C. Hubmann, A. Lawitzky, M. Aeberhard, and D. Wollherr A combined model-and learning-based framework for interaction-aware maneuver prediction. IEEE Transactions on Intelligent Transportation Systems (T-ITS) 17 (6), pp. 1538–1550. Note: IEEE, 2016 Cited by: §II-A, §II-D, §III-C, §V-B, TABLE VII.
  • [3] K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann (2010) The balanced accuracy and its posterior distribution. In

    2010 IEEE International Conference on Pattern Recognition (ICPR)

    pp. 3121–3124. Cited by: §V-A.
  • [4] J. Colyar and J. Halkias (2007) US highway 101 dataset. Federal Highway Administration (FHWA), Tech. Rep. FHWA-HRT-07-030. Cited by: §II-B.
  • [5] A. Corduneanu and C. M. Bishop (2001) Variational bayesian model selection for mixture distributions. In Artificial Intelligence and Statistics, Vol. 2001, pp. 27–34. Cited by: §IV-C.
  • [6] N. Deo, A. Rangesh, and M. M. Trivedi How would surround vehicles move? A unified framework for maneuver classification and motion prediction. IEEE Transactions on Intelligent Vehicles (T-ITS) 3 (2), pp. 129–140. Note: IEEE, 2018 Cited by: §II-C, §II-D.
  • [7] N. Deo and M. M. Trivedi (2018) Convolutional social pooling for vehicle trajectory prediction. In

    2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR)

    pp. 1468–1476. Cited by: §II-C, §VII.
  • [8] L. Fahrmeir, C. Heumann, R. Künstler, I. Pigeot, and G. Tutz (2016) Statistik: der Weg zur Datenanalyse. Springer Spektrum, Berlin. Cited by: §IV-A.
  • [9] I. Guyon and A. Elisseeff An introduction to variable and feature selection. Journal of Machine Learning Research (JMLR) 3, pp. 1157–1182. Note: The MIT Press, 2003 External Links: ISSN 1532-4435, Link Cited by: §IV-A, §IV-A.
  • [10] M. A. Hall (2000) Correlation-based feature selection for discrete and numeric class machine learning. In 2000 International Conference on Machine Learning (ICML), pp. 359–366. External Links: ISBN 1-55860-707-2, Link Cited by: §IV-A, §IV-A.
  • [11] T. Hastie, J. Friedman, and R. Tibshirani (2001) The elements of statistical learning. Vol. 1, Springer series in statistics, New York, USA. Cited by: item 3.
  • [12] S. Klingelschmitt, M. Platho, H. Groß, V. Willert, and J. Eggert (2014) Combining behavior and situation information for reliably estimating multiple intentions. In 2014 IEEE Intelligent Vehicles Symposium Proceedings (IV), pp. 388–393. Cited by: §II-D.
  • [13] R. Kohavi and G. H. John Wrappers for feature subset selection. Artificial Intelligence 97 (1), pp. 273 – 324. Note: Elsevier, 1997 External Links: ISSN 0004-3702, Document, Link Cited by: §IV-A.
  • [14] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein (2018) The highD dataset: a drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems. In 2018 IEEE International Conference on Intelligent Transportation Systems (ITSC), Cited by: §VIII.
  • [15] S. Lefèvre, D. Vasquez, and C. Laugier A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH journal 1 (1), pp. 1. Note: Nature Publishing Group, 2014 Cited by: §I-B.
  • [16] D. Lenz, F. Diehl, M. T. Le, and A. Knoll (2017)

    Deep neural networks for markovian interactive scene prediction in highway scenarios

    In 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 685–692. Cited by: §II-B.
  • [17] K. P. Murphy (2012) Machine learning: A probabilistic perspective. The MIT Press, Cambridge, Massachusetts & London, England. Cited by: §V-A.
  • [18] J. Schlechtriemen, A. Wedel, G. Breuel, and K.-D. Kuhnert (2014) A probabilistic long term prediction approach for highway scenarios. In 2014 IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 732–738. Cited by: §II-B, §VI-A, §VII-B.
  • [19] J. Schlechtriemen, K. P. Wabersich, and K. Kuhnert (2016) Wiggling through complex traffic: planning trajectories constrained by predictions. In 2016 IEEE Intelligent Vehicles Symposium (IV), pp. 1293–1300. Cited by: item 3, §II-D.
  • [20] J. Schlechtriemen, A. Wedel, J. Hillenbrand, G. Breuel, and K. Kuhnert (2014) A lane change detection approach using feature ranking with maximized predictive power. In 2014 IEEE Intelligent Vehicles Symposium Proceedings (IV), pp. 108–114. Cited by: §II-A, §II-D, §III-A, item 1, TABLE VII.
  • [21] J. Schlechtriemen, F. Wirthmueller, A. Wedel, G. Breuel, and K. Kuhnert (2015) When will it change the lane? A probabilistic regression approach for rarely occurring events. In 2015 IEEE Intelligent Vehicles Symposium (IV), pp. 1373–1379. Cited by: §I-B, §II-C, §III-A, §III-C, §III-D, item 2, TABLE VII, §VI, §VII-A, §VII-B, TABLE X.
  • [22] S. Tattersall, U. Petersen, and J. Breuer Ein Messdatenmanagementsystem für die Feldabsicherung von neuen Fahrerassistenzsystemen. VDI-Berichte (2166). Note: VDI, 2012 Cited by: §III-A.
  • [23] A. Thorvaldsson and V. Bandi (2015) Reference path estimation for lateral vehicle control. Master Thesis, Chalmers University of Technology. Cited by: item 3.
  • [24] G. Weidl, A. L. Madsen, S. Wang, D. Kasper, and M. Karlsen Early and accurate recognition of highway traffic maneuvers considering real world application: a novel framework using bayesian networks. IEEE Intelligent Transportation Systems Magazine (ITSM) 10 (3), pp. 146–158. Note: IEEE, 2018 External Links: Document, ISSN 1939-1390 Cited by: §I, §II-A, §II-D, §III-A, §V-A.
  • [25] J. Wiest, F. Kunz, U. Kreßel, and K. Dietmayer (2013) Incorporating categorical information for enhanced probabilistic trajectory prediction. In 2013 IEEE International Conference on Machine Learning and Applications (ICMLA), Vol. 1, pp. 402–407. Cited by: §II-B.
  • [26] J. Wiest, M. Höffken, U. Kreßel, and K. Dietmayer (2012) Probabilistic trajectory prediction with gaussian mixture models. In 2012 IEEE Intelligent Vehicles Symposium (IV), pp. 141–146. Cited by: §II-B, §II-C, §II-D, §IV-C.
  • [27] C. J. Willmott and K. Matsuura Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research 30 (1), pp. 79–82. Note: Inter-Research, 2005 Cited by: §VII-B.
  • [28] C. Wissing, T. Nattermann, K. Glander, and T. Bertram (2017) Probabilistic time-to-lane-change prediction on highways. In 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 1452–1457. Cited by: §II-C.
  • [29] C. Wissing, T. Nattermann, K. Glander, and T. Bertram (2018) Trajectory prediction for safety critical maneuvers in automated highway driving. In 2018 IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 131–136. Cited by: item 2, §II-C, §II-D, TABLE VII, TABLE X.
  • [30] C. Wissing, T. Nattermann, K. Glander, C. Hass, and T. Bertram Lane change prediction by combining movement and situation based probabilities. IFAC-PapersOnLine 50 (1), pp. 3554–3559. Note: Elsevier, 2017 Cited by: §II-A, §II-C, §III-A, TABLE IV.
  • [31] H. Woo, Y. Ji, H. Kono, Y. Tamura, Y. Kuroda, T. Sugano, Y. Yamamoto, A. Yamashita, and H. Asama Lane-change detection based on vehicle-trajectory prediction. IEEE Robotics and Automation Letters (RA-L) 2 (2), pp. 1109–1116. Note: IEEE, 2017 Cited by: §II-C, §II-D, §III-A, §III-C, §V-A.
  • [32] S. Yoon and D. Kum (2016) The multilayer perceptron approach to lateral motion prediction of surrounding vehicles for autonomous vehicles. In 2016 IEEE Intelligent Vehicles Symposium (IV), pp. 1307–1312. Cited by: §II-C, §III-C, item 3, §V-A, TABLE X.