New business models and the digitalization in the society have revolutionized the automotive sector, as many other industries. Self-driving vehicles are considered to be one of the new mega trends that is of high importance for the future of automotive industry with many social and technological impacts. It, alongside the social aspects, could provide promising solutions to reduce accidents in traffic, ease gridlock issues and allow for more comfortable and productive commutes. In order to comfortably integrate Autonomous Drive (AD) in society, it must be safe and tested properly; a demand that requires extracting and annotating large amounts of driving scenarios that is needed for further verification and validation of autonomous drive functionality in virtual and real test environments.
The importance of obtaining a robust and high quality scenario dataset can be seen in several steps of AD test and verification. However, robust annotation of large time-series scenario datasets is a costly and time consuming task due to heavy dependence on human interaction in order to manually label each data point correctly. The data concerning autonomous vehicles comes in the form of time series, and the complicated nature of this data increases the risk of missing rare classes, or mis-classification of fringe cases, by the automated annotation algorithms. These are some of the issues that need to be addressed in order to be able to argue for scenario catalog/database completeness and also in order to better understand the traffic/driving behavior.
Active learning provides solutions for accurate and robust data labeling at a low cost. In this paradigm, a small dataset needs to be initially annotated, then only the most informative data points will be queried to be labeled by an expert/human. One can consider active learning as a sequential decision making procedure wherein at every step, two operations are performed: i) select the next data point to be labeled, and ii) update the classification model using the newly labeled data point. Its concept is beyond data labeling and annotation, it is studied for example for decision making as well [ChenRCK17].
Reviews of of active learning methods can be found in [CohnGJ96, settles.tr09, WilsonMaximizingOptimization, Sener2017ActiveApproach]. The study in [Houlsby2011BayesianLearning]
uses predictive entropy with the Gaussian Process Classifier to yield a Bayesian active learning method called BALD. The method is then investigated for deep learning with some approximate techniques based on drop-out[GalIG17, BatchBALD19]. Bosser et al. [bosser2020model]
have studied several model-centric and data-centric aspects of active learning with neural network models, where they show that on MNIST and CIFAR-10 data, the margin strategy yields higher performance compared to alternatives consistent with the study in[margin_performs_well]. [pimentel2020deep]
studies anomaly detection using active learning.[AL4Reaction2021] investigates active learning for drug discovery in particular for reaction yield prediction in order to identify the successful reactions with a minimal experimental cost. Active learning can be performed in the form of querying class labels or querying pairwise relations. The former is more common [CohnGJ96, GalIG17, Hanneke07], and has been widely used in several applications such as robotics, text and image classification, medicine, manufacturing and log data analysis [tong2001active, YanCJ18, GaoZYADP20]
. Querying pairwise relations, on the other hand, has been mainly studied in the context of semi-supervised learning and interactive clustering[NIPS2016_6449, JMLR_Awasthi].
In this paper, we investigate the effectiveness of active learning to annotate AD time series trajectory data that can be used later for testing, verification and validation of autonomous driving (i.e. self-driving cars). We also conduct a study to investigate the potential of using active learning to discover unknown classes. In order to extract the temporal and sequential nature of the data, we first embed the time series into a latent space. For this purpose, we study different latent space representations including multivariate Time Series t-Distributed Stochastic Neighbor Embedding (mTSNE) [mtsne2017, van2008visualizing], Recurrent Auto-Encoder (RAE) [demetriou2020deep] and Variational Recurrent Auto-Encoder (VRAE) in relation to the proposed active learning framework. To obtain VRAE embedding, we adapt the framework we have developed for RAE in [demetriou2020deep] to the variational setting. For classification, we investigate Support Vector Machines (SVM) and Neural Networks (NN) in combination with one of the query strategies entropy, margin or random. To assess the performance of the model, we use the F1 score w.r.t. the number of queries. We then extend the framework to unknown class detection where the trajectories do not necessarily come from the target classes and we employ the proposed active learning framework to detect such cases. The same embeddings, query strategies and classifiers used for trajectory classification are going to be investigated for this purpose as well. We evaluate the performance of this task based on the number of queried cut ins (as the unknown class).
This paper is an extension of our preliminary work published in [fastzero2021] wherein the classification of time series trajectory data using active learning has been briefly introduced. In this work, we additionally, i) perform further investigations on factors that influence the performance of active learning, such as the choice of classifier and the allocated budget, ii) extend the active learning framework to detect unknown classes, and iii) elaborate further on the proposed framework and discuss its different aspects.
The rest of the paper is organized as following. In Section 2, we describe the data and its preparation used in this study. In section 3, we introduce the latent space representations and the embedding methods to be employed within the active learning framework. In Section 4, we describe the framework for active learning, including the classification models and the query strategies. In Section 5, we extend the framework to identify the trajectories with unknown class labels. In Section 6, we perform the experimental studies, and finally, in Section 7, we conclude the paper.
The datasets used in this work are provided by Volvo Car Corporation (VCC), and consist of the geo information about the ego vehicle and its surroundings objects detected by ego car, including information about the road, etc. We extract and use the lateral and longitudinal road positions of the surrounding vehicles in order to obtain three different kinds of trajectories, namely left drive by, right drive by and cut in. To keep the datasets as constant as possible the data is first split into (initially) annotated set, unlabeled set and test set. The small annotated set is used at the beginning of active learning to obtain an initial classification model to be used for investigation of query strategies.
Then some cut ins are removed to achieve the desired class distribution. The next step is to transform the time-series trajectories to a latent space wherein the temporal aspects are taken into account. After such an embedding, we sometimes call a trajectory a data point, as it is then shown by a vector. The number of data points (trajectories) in each set for every class distribution can be seen in Table 1, where is the percentage of cut ins. Each set contains equal amount of left and right drive by data points. It is worth mentioning that when training the model via active learning, for testing we use the ground truth labels developed by VCC. Such labels are obtained using the domain knowledge.
|Data set||Annotated set||Unlabelled set||Test set|
3 Latent space representations for active learning
As mentioned, the first step is to model the temporal and sequential order of the trajectories. For this purpose, we embed the trajectories and obtain a data point in a latent space for each trajectory. In this section, we describe the different trajectory embedding methods we use in this paper.
3.1 mTSNE with Dynamic Time Warping
One way to embed the trajectories into a latent space, is to use mTSNE together with Dynamic Time Warping (DTW) [van2008visualizing, hoseini2020generic]. It first computes the pairwise distances between trajectories using Dynamic Time Warping, and then applies a combination of stochastic neighbor embedding and t-distributed neighbor embedding. To obtain the distance between two trajectories, DTW matches the indices in two time series with some restrictions on the alignment. For example one index in a shorter trajectory might correspond to several in a longer trajectory. This is repeated for all pairs of trajectories in order to obtain the matrix of pairwise distances between trajectories (shown by ).
High dimensional pairwise distances
are then converted into probabilities of pairwise similarities using an exponential conditional distribution. The conditional probability
is used to imply the probability the -th trajectory (data point) would pick the
-th trajectory as its neighbor if both were drawn in proportion to their probability density under a Gaussian distribution centered around the-th one. For the low dimensional embedded data points and the low dimensional conditional probability can be computed in a similar way as shown in Eq. 2.
If the low dimensional data points correctly model the high dimensional distances between trajectories, and will take the same value. Stochastic neighborhood embedding (SNE) aims to find a low dimensional representation (in the form of a vector for every -th trajectory) that minimizes the difference between and .
To demonstrate how mTSNE obtains embeddings for our datasets, in Figure 1, we illustrate the latent space representation generated by mTSNE, with (a) 1024 data points in each class and (b) 10% cut ins. The colors red, white and blue respectively correspond to the cut in, right drive by and left drive by classes. Here we set the perplexity to 37.5.
3.2 RAE and VRAE embeddings
An alternative approach to produce the latent space representation is using a combination of Recurrent Neural Networks (to extract the temporal aspects of the trajectories) with (Variational) Auto-Encoders (to yield a latent space representation- an embedding). We investigate such a representation in two settings: i) Recurrent Auto-Encoder (RAE), and ii) Variational Recurrent Auto-Encoder (VRAE).
The schematic structure of RAE is shown in Figure 2(a) adapted from the model we have developed in [demetriou2020deep] for trajectory generation. Figure 2(b) shows the general structure of VRAE, that includes an additional layer mapping each trajectory (data point) to a distribution in latent space. The Auto-Encoder has two stacked LSTM cells and 64 features in the hidden state as well as in the latent space. In order to address the variable-length problem with the trajectories in our datasets, we group together the trajectories of a certain length to form a batch, which is then fed as input to the network. In this fashion, all trajectories within a batch have the same length.
VRAE ensures a coherent latent space, which implies the data points in close proximity will be located close to each other even in the latent space. VRAE encodes the data points as distributions rather than explicit points, then a new sample is drawn from these distributions as input to the decoder.
4 The Active Learning Paradigm
The goal of active learning is to label the dataset as informative and effective as possible with a minimal human interactions. In our framework, the input for active learning is the embedded data points into the latent space. To perform active learning, we follow Algorithm 1, beginning with training the model on a small already annotated dataset. Next, we classify the unlabeled data and calculate the informativeness of each data point according to the chosen query strategy. The most informative data point is then queried to an expert and is moved to the annotated set. After every query, the classifier is retrained. This iterative procedure continues until the allocated budget is spent.
4.1 Classification models
In this study, we examine two classification models, SVM and fully connected neural network (NN). For SVM, the radial basis function is used as the kernel. The NN consists of two hidden layers with respectively 128 and 256 neurons. VRAE seems to need a larger capacity. Thus, we use a larger NN with that, consisting of 5 hidden layers with 64, 128, 256, 128 and 64 neurons. Each layer is batch normed and ReLU is used as the non-linear activation function. To optimize, we use the Adam optimizer applied to the respective cross entropy loss.
4.2 Query strategies
We investigate the three commonly-used query strategies random, margin and entropy. As discussed in [bosser2020model]
, they represent different aspects of active learning for neural networks. Let U denote the unlabeled set. Random assigns a uniformly distributed informativeness to each data point in U,
The second strategy margin computes the informativeness for every unlabeled data point as
where is the data point, represents a classifier,
is a random variable that corresponds to the predicted label by the classifier,is the probability of given (here ), and are respectively the most probable and the second most probable classes for to belong to predicted by classifier C. The data point with the smallest margin is queried.
The third query strategy is entropy, which assigns the informativeness based on the entropy of the predictive distribution:
We note that here the summation is w.r.t. all class labels . The entropy can be viewed as the total amount of information in the entire distribution. The data point with the highest is queried.
5 Discovering Unknown Classes
In this section, we extend the proposed framework to employ active learning for finding unknown classes. To do so, we assume that the classification model (SVM or NN) is only trained based on two classes, whereas there might exist more classes in the set of unlabeled data which belong to an unknown class. Then, the goal is to identify such data points with unknown class labels. This can potentially be used for anomaly detection as well.
For this purpose, we hypothesize that when performing active learning, after a sufficient number of data points is queried from the existing (known) classes, then a reasonable querying strategy might query mainly from the unknown classes. The reason is that the classification model is then confident about the existing classes and yields the highest uncertainty for the data points with unknown class labels.
Here, we treat the cut in class as the unknown class, where the classification models use only the data points labeled as left and right drive by. Thus the entire model including the embeddings (e.g., the Auto-Encoders) are trained on only left and right drive by data points. We use the number of queried cut ins w.r.t. the total number of queried data points a measure of performance.
To justify our idea of using active learning for unknown class detection, here we carry out a preliminary study to investigate what type of trajectories the active learning method tends to query. Figure 3 shows a subset of queried trajectories using margin (Figure 3(a)) and entropy (Figure 3(b)) strategies (note that in our datasets, the velocity is relative to the ego vehicle). With both query strategies we observe that several double cut ins and decelerative cut ins are queried, which are rare forms of cut in trajectories. This discovery indicates the potential for active learning to be used for the purpose of finding unknown classes or even anomalies of a certain class.
6 Experimental Studies
Here, we investigate the different components of the proposed framework for active learning of the AD trajectories: embedding, query strategy, classifier and class distribution. The embeddings used are mTSNE, RAE and VRAE combined with SVM or NN classification models. We consider three query strategies, random, margin and entropy. The class distributions studied are balanced classes, 10% and 5% cut ins. The results are averaged over 10 runs and the faded colored lines represent the variance. We use the F1 score as the evaluation metric, since it is suitable for data with imbalanced classes. Unknown class detection is performed using the dataset with 10% cut ins, regarding cut in as the unknown class.
6.1 Investigation of embeddings for active learning
Figure 4 shows the results when using different embeddings with SVM (first row) and NN (second row) as the classification model. We observe that mTSNE achieves the highest F1 score. RAE also performs well in particular with NN. The VRAE embedding yields less promising results. The reason for why mTSNE outperforms the other embeddings is probably that its latent space is highly separable (as shown for example in Figure 1), which makes the classification task easy.
The reason why VRAE performs worse could be explained simply by the nature of VRAE. Each trajectory is mapped to a distribution in the latent space with a larger error margin. This means that the margins between the classes are blurred and these points are a mixture of the two bordering classes. Such points might not give more information compared to a randomly drawn point and hence it becomes difficult to classify. Margin queries data points that have a high likelihood to belong to two different classes, and entropy queries those with the highest uncertainty of belonging to the most certain class. This would indicate that both margin and entropy query the data points that likely fall between the classes.
6.2 Investigation of query strategies
In this section, we study and compare the different query strategies. Figure 5 illustrates the three query strategies for different embeddings using the SVM classifier. In Figures 5(a)-5(b), margin and entropy outperform random achieving a high F1 score much faster, for the datasets with .
Figures 5(c)-5(d) show the result for RAE embedded data. A similar behavior to mTSNE is observed, where using SVM the non-random query strategies (entropy and margin) yield better performance for both class distributions. In particular entropy obtains a stable and high F1 score in this setting. A general trend observed for the mTSNE and RAE embedded data is that SVM in combination with margin query tends to yield larger fluctuations, but with a stable baseline. This behavior is especially visible in Figure 5(c)
, where margin fluctuates more than entropy. A possible reason could be that margin, as the name indicates, queries the data points with the smallest margin. This means that SVM is more sensitive w.r.t. the new data points queried by margin, as the separating hyperplane can change its direction rapidly with new data points.
Looking at Figure 5(e), it is clear that entropy does not perform well when using VRAE. We also observe that there is no big difference between margin and random. However, this embedding yields in overall lower performance compared to mTSNE and RAE. As mentioned before, in this case the queried data points probably come from the boundary space between the classes, and do not provide more information than a randomly queried data point.
6.3 Investigation of choice of classifiers
In this study, we investigate the choice of classification model for active learning in our datasets. It has been observed that active learning is more effective in combination with SVM for all embeddings. This trend can be seen comparing Figure 6 (which uses NN) to Figure 5 (which uses SVM), where there is no significant difference in performance among the query strategies when we use an NN. For RAE, NN gives a higher F1 score than SVM, but for mTSNE and VRAE using SVM yields better performance compared to NN, as seen in Figure 7. However, SVM tends to show larger fluctuations than NN, because of higher sensitivity to newly queried data points.
The observation that the entropy and margin query strategies do not perform better in the case of using an NN, indicates the benefits from active learning do not contribute to the improved performance. This observation is consistent with the study in [AL4Reaction2021] on active learning for chemical reaction prediction. A possible reason for this behavior would be that SVM can recognize general trends fast, while an NN needs longer time to learn (due to its larger capacity), but on the other hand, it is capable of learning more complex features.
Since the VRAE embedding yields overall worse results than the other embeddings, a larger NN consisting of five hidden layers with 64, 128, 256, 128 and 64 neurons is tested. Despite the increased capacity, the performance does not improve. There are several factors that could contribute to the stagnation at 0.8, such as the nature of VRAE and model configurations. In general one can conclude that a simpler model like SVM is sufficient and suitable to be used in combination with active learning.
6.4 Investigation of class distributions
In Figure 8, we investigate different class distributions with 33%, 10% and 5% cut ins, and report the F1 scores for the three embeddings (we use SVM as the classifier). For all embeddings the class distribution with = 33 seems to give the best results, as it saturates the fastest and obtains the highest F1 score. The dataset with = 33 only performs slightly better than the one with = 10 for mTSNE and RAE, while the difference is much larger for VRAE embedded data. Having = 5 gives a poor performance in all cases, since the cut in class is under presented.
Even though having an equal number of data points from each class might be expected to yield a better performance by large margins, it is still not a trivial matter. A higher percentage of cut ins leads to a larger variation as well, since cut ins can come in several forms difficult to recognize, such as double cut ins and decelerative cut ins.
6.5 Budget size
Figure 8(a) demonstrates that with mTSNE, a high F1 score can be achieved within 25 queries for all values of considered. With RAE and the SVM classifier, for an F1 score of above is reached after approximately 225 queries, and for an F1 score of around 0.7 is achieved after 200 queries, see Figure 8(b). Note that SVM is not the optimal choice of classifier for RAE. Looking at Figure 6(b), only around 125 queries is required to obtain an F1 score above 0.9 with RAE for . Figure 8(c) shows that with VRAE, the F1 score plateaus around 0.8 and 0.6 after around 250 queries for . For , the saturation at 0.8 occurs within 25 queries.
6.6 Investigation of unknown class detection
Finally, we investigate unknown class detection using the proposed active learning framework, where the cut in class is considered as the unknown class. Figure 9 shows the number of queried cut ins for each embedding over 60 queries, using SVM or NN as classification models. Figure 9(a) and 9(b) show the results for the mTSNE embedded data. The results with SVM are more impressive compared to NN (therefore, for the two other embeddings we focus only on SVM). We observe that the entropy and margin strategies query more cut ins using SVM. With the NN classifier, the entropy and margin strategies perform almost as good as the random choice. The reason could be, as mentioned earlier, SVM is a simpler classification model which learns and saturates faster compared to NN (which is a model with a large capacity). Thus, SVM quickly learns the two existing classes (left and right drive by) and then starts querying the unknown class members. On the other hand, NN needs a lot of data for a proper learning.
Figures 9(c) and 9(d) show the unknown class detection results respectively for RAE and VRAE embeddings employed with SVM. We observe that in both settings the entropy or margin querying strategies can be helpful for identifying the cut ins. This is even more obvious when using the VRAE embedding.
In this study, we investigated the performance of active learning as an effective tool for reliable and cost-efficient labeling of the time series trajectory data collected from Autonomous Drive (AD) application. For this purpose, we developed a framework wherein we first embed the trajectories into a latent space representation (using mTSNE, RAE and VRAE) in order to extract the temporal nature of the trajectories. We then apply the active learning paradigms using different querying strategies and classification models in the embedded latent space. We also explored the possibilities for unknown class detection using the proposed active learning framework.
We observe that in many settings, active learning constitutes an effective tool. The positive effect is particularly more obvious with the SVM classifier, for both mTSNE and RAE embeddings. The class distribution yielding the best performance is when = 33, that is only slightly better than = 10. The choice of a proper embedding affects significantly the results. In particular, mTSNE yields consistently the best performance compared to the alternative. With mTSNE, the entropy querying strategy used with SVM can be seen as the best option due to high performance with a small number of queries, and a higher stability than margin.
RAE still performs well in particular compared to VRAE. With the RAE embedding, using an NN yields better performance than SVM, with no explicitly significant difference in performance among the query strategies. The VRAE embedding does not achieve a particularly high performance regardless of the choice of the classifier. There is no significant difference between random and margin, but entropy performs somewhat worse in this setting.
Regarding unknown class detection, we observe that the proposed active learning framework can be useful for this task as well. In particular, when we use SVM with any of the embedding methods, we see that the entropy or margin query strategies yield identifying more cut ins compared to the random strategy. Due to large capacity, this is less obvious with NN.
We would like to acknowledge Volvo Cars for providing the data and computational resources. The work of Morteza Haghir Chehreghani was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.