A basic prerequisite for highly automated driving in urban areas are precise models of the surrounding (traffic) environment of the vehicles. These models are able to detect intentions of other road users and forecast their future trajectories such that safe driving strategies can be realized. In future traffic scenarios, vehicles and other road users communicate, exchange information, and cooperate on various levels . Vulnerable road users (VRU), e.g. pedestrians, cyclists etc., are important participants in today’s and future traffic. The behavior of VRUs is highly dynamic, situation- and context-dependent and often not in compliance with traffic regulations. Moreover, there are a variety of different VRU classes, e.g. pedestrians, cyclists, and skateboarders, which all behave differently. For models based on machine learning techniques, a considerable amount of sample data is required. This data has to be acquired in time-consuming and costly experiments, requiring high precision sensors which are used as reference data.
In this article we propose a concept which shall allow to gather sample data using close to serial production sensors to efficiently create and enhance detection and prediction models, i.e. alleviating the necessity to perform costly experiments and minimize the required amount of human interaction (e.g labelling effort).
2 Architecture description
We propose a holistic approach consisting of three fundamental stages for continuous autonomous learning as described in Figure 1. Sample data is gathered in a goal-oriented fashion in the everyday traffic using close to serial production sensors of vehicles, e.g. camera or radar. In particular, no high precision sensors are used. The acquired data is used to constantly improve existing detection and prediction models, to integrate context information, e.g. time of day, traffic volume and location information, or to automatically create new models, e.g. for new or so far unconsidered VRU classes. Detection, classification, and context identification are performed in the first step. To cover wide ranges of different VRU classes in daily traffic, e.g. pedestrians with strollers or skateboarders, the gathered data is autonomously clustered and divided into new classes. The necessity of feedback from human experts shall be minimized. This requires novelty detection and active learning which are implemented in the second step. Finally, in the third step motion prediction and detection models are generated and improved based on the clustered data. These models are used for automated driving as part of perception and prediction component. Moreover, these models also improve the detection and classification in the first step. To bootstrap the iterative learning process, initial models based on labeled data are used.
3 Detection, classification, and context identification
The first important challenge in this context forms the generation of a ”ground truth”. This is usually time consuming or based on expensive equipment in current processes, because the positions of the detected traffic participants have to be determined as precisely as possible. We aim to achieve a precise self-localization with close to serial production sensor systems and high-accuracy maps . The detection and classification of VRUs  should, as well, be based on production sensor systems such as stereo cameras, laser scanners or radar. The detection is extended by tracking  based on continuously refined motion models. As soon as a map of the environment is given, we intend to combine the tracking step with a semantic classification of the situation, e.g. we aim to determine if the pedestrian is walking in the street or on the sidewalk.
Since a real ground truth is not available with the equipment intended and in order to be able to evaluate the results, we aim to develop quality and certainty measures concerning the detection and position of a VRU.
4 Clustering, novelty detection, and active learning
The second step is concerned with the assessment of classification and detection results of the former step with respect to the models used for detection and prediction. Prediction and detection models have to assess themselves, i.e. to decide autonomously whether a particular observed behavior of a VRU can be used to improve prediction quality (e.g. detection of unknown and novel VRU classes or unusual trajectories). Innovative active learning [2, 9], novelty detection 
, and clustering techniques allow the system to select only valuable information such that it can cope with a massive amount of data. Moreover, semi-supervised learning methods enable the system to make use of structural and non-labelled information, thus supporting the active learning process. Using these techniques, the system is capable to autonomously decide from which data source and sample it can extract knowledge and henceforth it is able to learn efficiently.
5 Online model generation and improvement
In the third step we develop algorithms to improve existing models to predict the future trajectories of traffic participants [3, 10] at runtime through self-learning techniques. To further improve the prediction we not only distinguish between different classes of traffic participants, but we also subdivide the classes based on their behavior in different situations, e.g. pedestrian on the sidewalk or pedestrian crossing the street. Compared to data generated in specific series of experiments, the available data is less accurate, but it consists of a greater number of examples, which are more realistic and cover behavior patterns that cannot be created in an experiment, e.g. VRU behavior which is not in compliance with the road safety regulations.
Furthermore, we investigate how additional knowledge, such as the time of day or the traffic light phase can be used to improve the prediction process.
In this article we presented a concept to cost-efficiently improve VRU detection and intention prediction models based on machine learning techniques. We are aware that in this article we are mainly raising challenges without offering detailed solutions. These are categorized into three steps as presented in the architectural sketch. We are convinced that our research will help to overcome the mentioned problems and that we may contribute to active safety of VRUs enabling highly automated driving.
This preliminary work partly results from the project DeCoInt, supported by the German Research Foundation (DFG) within the priority program SPP 1835: ”Kooperativ interagierende Automobile”, grant numbers DO 1186/1-1 and FU 1005/1-1 and SI 674/11-1. The work is also supported by ”Zentrum Digitalisierung Bayern”.
-  Bieshaar, M., Reitberger, G., Zernetsch, S., Sick, B., Fuchs, E., Doll, K.: Detecting intentions of vulnerable road users based on collective intelligence. In: AAET – Automatisiertes und vernetztes Fahren. pp. 67–87. Braunschweig, Germany (2017)
-  Calma, A., Leimeister, J.M., Lukowicz, P., Oeste-Reiß, S., Reitmaier, T., Schmidt, A., Sick, B., Stumme, G., Zweig, K.A.: From active learning to dedicated collaborative interactive learning. In: 29th International Conference on Architecture of Computing Systems. pp. 1–8. Nuremberg, Germany (2016)
Goldhammer, M., Köhler, S., Doll, K., Sick, B.: Camera based pedestrian path prediction by means of polynomial least-squares approximation and multilayer perceptron neural networks. In: 2015 SAI Intelligent Systems Conference (IntelliSys). pp. 390–399 (2015)
-  Gruhl, C., Sick, B.: Novelty detection with CANDIES: a holistic technique based on probabilistic models. International Journal of Machine Learning and Cybernetics 7(33), 1–19 (2016)
-  Köhler, S., Goldhammer, M., Bauer, S., Zecha, S., Doll, K., Brunsmann, U., Dietmayer, K.: Stationary detection of the pedestrian’s intention at intersections. IEEE Intelligent Transportation Systems Magazine 5(4), 87–99 (2013)
Meissner, D., Reuter, S., Wilking, B., Dietmayer, K.: Road user tracking using a Dempster-Shafer based classifying multiple-model phd filter. In: Proceedings of the 16th International Conference on Information Fusion. pp. 1236–1242 (2013)
-  Reitmaier, T., Calma, A., Sick, B.: Transductive active learning – a new semi-supervised learning approach based on iteratively refined generative models to capture structure in data. Information Sciences 293, 275–298 (2015)
-  Schindler, A.: Vehicle self-localization with high-precision digital maps. In: 2013 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops). pp. 134–139 (2013)
-  Settles, B.: Active learning literature survey. Tech. rep., University of Wisconsin-Madison (2010)
-  Zernetsch, S., Kohnen, S., Goldhammer, M., Doll, K., Sick, B.: Trajectory prediction of cyclists using a physical model and an artificial neural network. In: 2016 IEEE Intelligent Vehicles Symposium (IV). pp. 833–838 (2016)