Modern vehicles are equipped with an increasing number of sensing devices, such as Global Positioning System (GPS), Inertial Measurement Units (IMU), and other sensors that communicate through the Controller Area Network (CAN-Bus). This real-time sensed data can be used to detect, analyze, predict, and plan a large variety of issues such as traffic congestion, vehicle energy consumption and emissions, urban mobility, and drivers’ behavior. Multiple approaches have been developed and applied to accurately identify driving behavioral patterns, such as driver recognition [1, 2], maneuver recognition [3, 4, 5], and aggressive driving detection . While an accurate classification of the driving behavior can contribute to a better driving experience for the driver, there are also other applications where such classification can be useful.
Recently, there are been a growing interest from car insurance companies in designing driver behavior classification systems that could eventually be used to relate their costumers’ fees to how they drive. As a part of this solution, it is of interest to accurately classify the level of aggressiveness of their customers’ recorded trips. Nevertheless, the large number of trips would not allow to identify for each one the type of driving. Consequently, several works such as , , and 
have been conducted to solve this problem by an unsupervised learning approach. In the mentioned work, the goal is to find clusters from the recorded trip data which can be characterised by different levels of the aggressiveness without relying on the labels.
Since the labels, i.e., the driving style, remain a crucial element in order to apply a supervised algorithm, generating realistic artificial data can be an alternative to increase the size of the training or validation datasets and possibly improve the quality of the classification. Semi-supervised learning is motivated by the availability of large datasets with unlabeled features in addition to labeled ones, in different applications , , 
. This lack of labeled data can be efficiently addressed through a deep learning pipeline.
Another application of interest for driving behavior classification is the development of autonomous vehicles. A better understanding of how humans drive can indeed allow for both a better functioning on a technical level and, of course, minimizing as much as possible any error, in view of the security of the users. Identifying aggressive drivers is crucial in developing safer autonomous driving techniques and advanced driving assistant systems. This problem has been extensively studied over the past decades in several works , , , . Current autonomous driving systems use a wide range of algorithms to process sensor data. Some work, as , uses end-to-end approaches to make navigation decisions from the sensor inputs such as camera images, LIDAR data, etc. A variety of sensors can be useful for cars to extract important information to improve the quality of autonomous vehicles and to learn how to drive safely and efficiently. Nevertheless, data collection can also be expensive and restricted in terms of privacy. Simulating data and exploiting it in the same context as real ones appears as a solution to study. The attention to generative models is increasing due to their capability of modelling underlying patterns in multidimensional data. However, assessing the quality of the synthetic data remains a crucial point to validate.
In this paper, we formulate the problem of generating labeled IMU signals, representing aggressive and normal drivers, of one-minute length for a specific part of the road, using Recurrent Conditional GANs. The generated data will be practically assessed based on its capacity to improve the classification of the semi-supervised framework.
Ii Related work
Since obtaining real sensor data can be costly, time-consuming, and have privacy issues, there have recently been several studies on sensor modelling for virtual testing, e.g. in , , , 
which are mostly based on parametric models. In, a non-parametric statistical model was developed allowing for the generation of sensor position output. In  a radar model is proposed where noise is added to the raw signals, and then filtering is applied to model sensor output. Further, 21]
, an Autoregressive InputOutput Hidden Markov Model (AIO-HMM) was proposed by fusing sensory streams through a linear transformation of features to synthesize real-valued time series describing sensor errors based on data describing the environment.
Generative Adversarial Networks (GANs) 
have proven to perform well in generating different types of data. Different research works, from computer vision, , 26], had shown that the application of this kind of generative models can provide good results. In  a Recurrent Conditional Generative Adversarial Network (RCGAN) has been proposed for modelling real-valued time series describing sensor outputs that are used in autonomous driving applications. In , the authors augmented the LiDAR sensor data in simulated environments, by employing CycleGANs.
Evaluating GANs is a challenging task. Unlike other deep learning models which are trained with a loss function until convergence, a GAN generator is trained and combined with a discriminator that learns to distinguish between the real or fake data. Both the generator and discriminator model are trained together to reach an equilibrium. Hence, there is no objective loss function used to train separately the GAN generator models. Some would rely on a visual assessment by having appealing results that agree with the real distribution. The latter shows a high potential for some data, especially for images. Meanwhile, when time series are inspected visually, this remains an inconsistent method, since it is based on a manual operation to inspect each generated sample. By evaluating a convenient distance metric between the real and fake data distribution, we can assess the trained model and infer how much the model is capturing temporal patterns. Quantitative measures, such as reconstruction loss, Kullback-Leibler (KL) divergence, Jensen-Shannon (JS) divergence can be combined with visual assessment to provide a robust assessment of GAN models. A quantitative extrinsic approach like in and  is also an alternative, which mainly relies on an external method to measure the quality of the generated data.
This paper makes two main contributions to the field of driving behavior classification. First, it addresses the problem of data augmentation of car sensors. In our study, we generate IMU signals of one-minute length in a common portion of the road characterised by the type of driving style, which is either aggressive or normal. We use Recurrent Conditional GANs for the generation of these labeled time series. Second, we build a framework to evaluate the quality of generated data from a practical perspective. In other words, we assess the quality of the data based on the improvement of a semi-supervised model, which identifies the type of driving, by adding different percentage of synthesised data to the classifier’s training and/or validation sets. Consequently, the paper investigates how much data should be generated and in which set should be used, to improve the accuracy of the driving behavior’s classification.
In this section, we firstly present the experimental setting used to collect the labeled data. Then, we present the data preprocessing tasks, followed by the generative model used to synthesize the multidimensional time series. Finally, we present the extrinsic assessment framework used to evaluate the generated data. The entire pipeline is shown in Fig. 1.
Iv-a Experimental Setting
The dataset being used in this paper was collected from a vehicle simulator. The experiment consisted of drivers driving separately using different cars, in the same circuit. The circuit is depicted in Fig. 2. The drivers had been asked to drive both in a normal and in an aggressive way. By doing so, we have close to a ground-truth about which recorded trips that are normal or aggressive. The simulator was collecting the same signals as a real IMU unit, i.e., longitudinal acceleration, lateral acceleration, pitch, yaw, and roll. All the signals had the same sampling frequency, namely 1000 Hz. In total, the dataset consists of simulation drives.
Iv-B Data preprocessing
For computational reasons, we down-sampled the IMU signals to 1 Hz, by taking each observation. Although this down-sampling is done mostly for computational convenience, it is also very likely that in practical applications the hardware will have a more limited sampling frequency compared to the simulator. Since the signals may contain artifacts, we filtered them by applying a moving average filter with a sliding window of ten samples. We limited our study only on the first one minute of each trip, both for computational reasons, but also since in practical applications, it would be favorably to classify the driver without too much history. A similar choice time-window has previously been suggested in .
All features were normalized using a MinMax scaler . Our dataset was split into the labeled data used in training the RCGAN and the unlabeled one used in the semi-supervised part. The RCGAN was trained only on a dataset of trips, equally balanced between aggressive and normal.
Iv-C Data generation
RCGANs were originally developed and implemented in 
for medical applications. Our paper was inspired from this work to synthesize IMU signals for a normal and an aggressive trip. We will start this section by give a brief introduction to Recurrent Neural Networks (RNN). Next, we will introduce long short-term memory RNNs, which is an extension of the RNN framework. This subsection ends with a description of the RCGAN model, which is using long short-term memory RNNs.
Iv-C1 Recurrent Neural Networks
RNNs are mostly used for sequential modeling and learning. They process one element of input data at a time and implicitly store previous information using cyclic connections of hidden units. Given a sequence of vectors, , where , the RNN outputs a representation, that is a sequence of vectors , where . The sequence is determined iteratively through:
where , and . The function is a non-linear mapping and often chosen as applied component-wise.
The output vector transforms the current hidden state in a way that depends on the final task. For classification, it is computed as
Note that are network parameters determined through gradient descent. The scalars and are the dimensions of the hidden layer, the input, and the output, respectively. For example, in the case of 2-category classification,
and the probability vectorrefers to the probabilities of each input element belonging to each category.
Iv-C2 Long Short-Term Memory (LSTM)
In practice, vanilla RNN encounters numerical computation difficulties. One reason presented in  is that it would cause the gradient to vanish and explode while computing the back-propagation through time, on data with long term dependencies. The vanilla RNNs only consider short term dependencies. The Long Short-Term Memory (LSTM) technique was therefore introduced to mitigate this kind of risk. The latter incorporate a memory cell together with an input gate , an output gate and a forget gate . The memory cell enables the network to remember its state over time, and by doing so it is possible for the full network to capture long-term temporal dependencies present in the training data. The evolution of LSTM states are determined by:
where and are learnable parameters. The function
denotes sigmoid activation function, that is applied element-wise. The quantities, , and stand for the input, forget and output gates respectively. The output of the LSTM cell is and denoting point-wise vector products, i.e., Hadamard product. Fig. 3 illustrates the learning mechanism through the LSTM cell.
Iv-C3 Recurrent Conditional Generative Adversarial Networks
RCGANs are generative recurrent neural networks that aim at generating real-valued time series subject to a conditional information. In the RCGAN architecture there are two different LSTM-RNNs trained simultaneously, a generator and a discriminator , which have conflicting objectives. The generator learns over the training data, whereas the goal of the discriminator is to discriminate between the synthetic data generated by and the real data, as depicted in Fig. 4. We denote by and the feature dimensions of the data, the conditional information and the latent/noise space, respectively. Let be the length of the time series and .
In practice, the min-max game problem is described as:
where is the distribution of the real data and is a prior distribution over the input noise variables. These latter, i.e., the sequences of points, are drawn independently from .
In our case, the input consists of five-dimensional time series data, i.e., the signals for trips, with a binary condition attributing the type of driving (normal/aggressive). The length of all time series is equal to . For more details about the architecture of our trained RCGAN, see Table I. Our RCGAN generates an example from a specific class. In other words, if we ask for the aggressive class, the generator produces one aggressive trip. Thus, after training the model, the number of generated trips per each class, should be defined. We fed the RCGAN with the training set, and then generated and new trips from the data.
Iv-D Data evaluation
In order to evaluate the quality of the RCGAN model, we used a semi-supervised framework to classify whether a trip is aggressive or normal. Firstly, we extracted statistical features from the real and fake data. Nine statistical features were calculated out of the five time series to measure different properties of that variable, namely: mean, median, mode, standard deviation, skewness, kurtosis, 25 percentile, 75 percentile and interquartile range. A further description of a few of the statistical features is given below.
We denote by a real-valued series with and its mean and standard deviation, respectively.
The mode is the most frequently appeared value in the serie.
Skewness is used to measure the asymmetry of the data. Let be the
th moment, i.e.,
The skewness is then calculated with the third moment as .
Kurtosis is used to measure the peakedness of the probability distribution of the data and calculated as, where is the moment
A percentile is the value of a variable below which a certain percent of observations fall. In other words, the percentile is a value such that at most of the measurements are less than this value and are greater.
Iv-D5 Interquartile Range
Interquartile Range (IQR) is a measure of statistical dispersion. It is defined as the difference between the and the
percentiles, called the upper and lower quartiles.
The unlabeled part of the dataset were used for training an Autoencoder (AE), see Fig. 1, in order to transfer its weights and biases to the DNN classifier. The Autoencoder is a neural network, which aims to reconstruct the input, i.e., the target output is the input. It is composed of two main parts, an encoder that serves to compress the data in a lower dimensional space and a decoder which reproduces the input out of the bottleneck. The AE is trained in order to minimize the error between the real input and the constructed one. More formally, letbe the input, where , the compressed representation , mapped by ,
where , , and
, are respectively the activation function of the encoder, the weight matrix, and the bias vector. The functionis parameterized by
. The decoder part reconstructs the input from the hidden representationby the function ,
where , , and are respectively the activation function of the decoder, the weight matrix, and the bias vector. is parameterized by .
Each training input vector is mapped to a corresponding which is then mapped to a reconstruction such that . The parameters and of the model are optimized to minimize the average reconstruction error such that
with the loss function and is given by .
After training the Autoencoder on the unlabeled dataset, i.e., the trips from the simulator that was not used to train RCGAN, we use the weights and biases to initialize a supervised deep neural network (DNN) model and then fine-tune the DNN model using the labeled dataset to classify the type of driving. To measure how generated data can improve the data classification, we run various groups of experiments. In the first group which is our baseline, the classifier was trained and validated using only the real labeled dataset. In the following groups, we made all the combination of the training and the validation sets containing labeled real/fake/real+fake datasets. All the classifiers were trained only with the selected features.
We evaluate the classifier’s performance by measuring the Area Under Receiver Operating Characteristic (AUROC). This criterion is one of the most widely used metric to score the goodness of a predictor in a binary classification task. It ranges in value from to . The higher the AUROC, the better the classifier is at predicting the classes, which is the type of driving in our case. The AUROC is computed on the test set containing all the real data.
V Results and Discussion
Fig. 5 depicts a recorded trip and a generated trip, both labeled as normal. The figure illustrates how the RCGAN was able to grasp the correlation between the different signals of a normal trip, as well as the main patterns. In order to investigate the quality of the generated fake data and see whether it can be useful on a practical level, we applied our semi-supervised framework as an extrinsic evaluation. The generated fake data were used in both the training and the validation set of the classifier. All combinations of real and generated fake data is covered in Table II.
We ran the experiments 200 times. After each trained RCGAN, we generated different amount of data and we utilized them in the validation or training set of our classifier. In of the simulations the RCGAN reached at least an AUROC strictly higher than the baseline value, for at least one combination of real and fake data in every of the 200 runs. Table II shows the performance of the classifier of the semi-supervised framework, trained and validated on different sets consisting of combinations of real and fake data for a simulation outperforming the baseline. AUROC is measured on the test set which contains all the real trips. Bold depicts the AUROC superior to the baseline values. We can see that for most simulations the AUROC exceeds the baseline, for a variety of sets and ratios of real and generated fake data.
Number of epochs
|Discriminator optimizer||Gradient Descent|
|RNNs hidden units||100|
|Number of epochs||100|
|Learning rate1||0.001, 0.01, 0.1|
|Number of epochs1||100, 200, 500|
|Activation function1||tanh, maxout, rectifier|
Grid search was based on these parameters for the
Since we varied the percentage of real and generated fake data in both training and validation sets of the classifier, it is of interest of how much generated fake data that is needed and how it should be utilized by the classifier. Table III highlights the summary over the set of simulations which outperform the baseline, i.e., the number of recorded AUROCs that exceeds the baseline value, for each combination set and ratio fake.
|Training Set||Validation Set||Ratio Fake/Real||AUROC1|
|R + F3||R + F||50%||0.858|
|R + F||R||50%||0.851|
|R||R + F||50%||0.841|
|F||R + F||50%||0.841|
|R + F||F||50%||0.849|
This measure is computed on the test set, containing all the real data, namely the trips.
The set R consists of the real trips which were used to train the RCGAN.
The set F consists of the fake data which were generated from the RCGAN.
On a first glance, we can divide the Table III into three groups. First one containing the combination set which have the highest total number. Training on the real, while validating on both the real and the fake seems to be the best option in order to ensure a better classification.
The recorded number for each sets and percentage of fake data.
Training on the real data and validating only on the fake data can be also a good way to use the generated data. The second group contains the following combinations; training on the real data and fake data, while validating on the real data, training and validating on both the fake data and real data, and lastly training on real data and fake data while validating on the fake data. This group is characterised by a lower number of records comparing to the first one. This underlines the fact that incorporating the fake data in the training set is less likely to improve the classifier. The third group contains the remaining combinations. This group is characterised by the fact that the training set is only composed of fake data. The negligible number of this group excludes the possibility of using only the fake data to improve the classifier accuracy. This result can be justified by the fact that the generation of data is done on the basis of the real ones, therefore substituting the content of the classifier’s training set from real data to fake data, would not guarantee an improvement. The generative model had to learn from the real data to end up having new ones close enough to the original, but still different.
Fig. 6 shows that the classifier can perform well by training merely on the generated fake data. By training and validating on the fake data, we can have an AUROC slightly lower than the baseline, which still guarantees a good prediction of the type of driving.
The first group also reached the highest average of AUROC between its elements, comparing to the other ones. Consequently, we capture the importance of incorporating the fake data only into the validation set.
On the other hand, we want to see whether the size of the generated data would affect the performance of the classifier. Since we know from the previous results, in which combination sets the fake data worth to be used, we limited the scope only on the first group, which only train the classifier on the real data. We can see in Table III, that increasing a ratio fake to would give in overall, higher chances to improve the model. In this case, it means that synthesising more data than the size of the original one, can give a better classification of driving behavior.
In this paper, we outlined our experiences of using Recurrent Conditional GANs for generating IMU signals, which are assessed by the improvement of a semi-supervised framework to classify the type of driving. The classification applied on the extracted features of the real and synthetic data, was mostly improved by using the latter in the validation set. The two main contributions in this work are the generation of IMU signals and the quantitative extrinsic assessment of the synthetic data using a deep learning based approach. For future research, we plan tox investigate how the parameters of the RCGAN can be improved with the aim to find the most convenient network architecture to ensure an close to optimal classification given the limited amount of labeled data.
-  S. Choi, J. Kim, D. Kwak, P. Angkititrakul, and J. H. Hansen, “Analysis and classification of driver behavior using in-vehicle can-bus information,” in Biennial workshop on DSP for in-vehicle and mobile systems, 2007, pp. 17–19.
-  J.-M. McNew, “Predicting cruising speed through data-driven driver modeling,” in 2012 15th International IEEE Conference on Intelligent Transportation Systems. IEEE, 2012, pp. 1789–1796.
-  N. Oliver and A. P. Pentland, “Driver behavior recognition and prediction in a smartcar,” in PROC SPIE INT SOC OPT ENG, vol. 4023. Citeseer, 2000, pp. 280–290.
-  M. Brambilla, P. Mascetti, and A. Mauri, “Comparison of different driving style analysis approaches based on trip segmentation over gps information,” in 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017, pp. 3784–3791.
-  M. Enev, A. Takakuwa, K. Koscher, and T. Kohno, “Automobile driver fingerprinting,” Proceedings on Privacy Enhancing Technologies, vol. 2016, no. 1, pp. 34–50, 2016.
-  J. Carmona, F. García, D. Martín, A. Escalera, and J. Armingol, “Data fusion for driver behaviour analysis,” Sensors, vol. 15, no. 10, pp. 25 968–25 991, 2015.
-  U. Fugiglando, E. Massaro, P. Santi, S. Milardo, K. Abida, R. Stahlmann, F. Netter, and C. Ratti, “Driving behavior analysis through CAN bus data in an uncontrolled environment,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 2, pp. 737–748, 2018.
-  Y. Ma, Z. Zhang, S. Chen, Y. Yu, and K. Tang, “A comparative study of aggressive driving behavior recognition algorithms based on vehicle motion data,” IEEE Access, vol. 7, pp. 8028–8038, 2019.
-  M. Bahi and M. Batouche, “Deep semi-supervised learning for dti prediction using large datasets and h2o-spark platform,” in 2018 International Conference on Intelligent Systems and Computer Vision (ISCV). IEEE, 2018, pp. 1–7.
R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: Transfer learning from unlabeled data,” in
Proceedings of the 24th International Conference on Machine Learning, ser. ICML ’07. New York, NY, USA: ACM, 2007, pp. 759–766.
-  P. Zhang, X. Zhu, and L. Guo, “Mining data streams with labeled and unlabeled training examples,” in 2009 Ninth IEEE International Conference on Data Mining. IEEE, 2009, pp. 627–636.
-  J. Wei, J. M. Snider, T. Gu, J. M. Dolan, and B. Litkouhi, “A behavioral planning framework for autonomous driving,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings. IEEE, 2014, pp. 458–464.
-  H. Horii, “Modifying autonomous vehicle driving by recognizing vehicle characteristics,” Aug. 15 2017, US Patent 9,731,713.
-  T. Al-Shihabi and R. R. Mourant, “A framework for modeling human-like driving behaviors for autonomous vehicles in driving simulators,” in Proceedings of the fifth international conference on Autonomous agents. ACM, 2001, pp. 286–291.
-  M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.
-  E. Cheung, A. Bera, E. Kubin, K. Gray, and D. Manocha, “Classifying driver behaviors for autonomous vehicle navigation,” 2018.
-  N. Hirsenkorn, T. Hanke, A. Rauch, B. Dehlink, R. Rasshofer, and E. Biebl, “Virtual sensor models for real-time applications,” Advances in Radio Science, vol. 14, pp. 31–37, 2016.
-  S. Bernsteiner, Z. Magosi, D. Lindvai-Soos, and A. Eichberger, “Radar sensor model for the virtual development process,” ATZelektronik worldwide, vol. 10, no. 2, pp. 46–52, 2015.
-  T. A. Wheeler, M. Holder, H. Winner, and M. J. Kochenderfer, “Deep stochastic radar models,” in 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2017, pp. 47–53.
-  N. Hirsenkorn, H. Kolsi, M. Selmi, A. Schaermann, T. Hanke, A. Rauch, R. Rasshofer, and E. Biebl, “Learning sensor models for virtual test and development,” in Workshop Fahrerassistenz und automatisiertes Fahren, 2017.
-  E. L. Zec, N. Mohammadiha, and A. Schliep, “Statistical sensor modelling for autonomous driving using autoregressive input-output hmms,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 1331–1336.
-  I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014.
-  A. Bansal, S. Ma, D. Ramanan, and Y. Sheikh, “Recycle-gan: Unsupervised video retargeting,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 119–135.
-  B. Dai, S. Fidler, R. Urtasun, and D. Lin, “Towards diverse and natural image descriptions via a conditional gan,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2970–2979.
A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan:
Socially acceptable trajectories with generative adversarial networks,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2255–2264.
-  Z. Xu, B. Liu, B. Wang, S. Chengjie, X. Wang, Z. Wang, and C. Qi, “Neural response generation via gan with an approximate embedding layer,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 617–626.
-  H. Arnelid, “Sensor modelling with recurrent conditional gans recurrent conditional generative adversarial networks for generating artificial real-valued time series,” 2018, Master’s Thesis. https://hdl.handle.net/20.500.12380/256175.
-  A. E. Sallab, I. Sobh, M. Zahran, and N. Essam, “LiDAR Sensor modeling and Data augmentation with GANs for Autonomous driving,” arXiv preprint arXiv:1905.07290, 2019.
-  C. Esteban, S. L. Hyland, and G. Rätsch, “Real-valued (medical) time series generation with recurrent conditional gans,” arXiv preprint arXiv:1706.02633, 2017.
-  S. Patro and K. K. Sahu, “Normalization: A preprocessing stage,” arXiv preprint arXiv:1503.06462, 2015.
-  Y. Bengio, P. Simard, P. Frasconi, et al., “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.