In order for a Highly Autonomous Driving (HAD) system to select optimal driving strategies, it must first understand the context in which the vehicle is driving. For example, an HAD system deploys different driving strategies when the ego-car is driving on the highway, as opposed to driving in the inner-city.
Illustrated in Fig. 1, the Deep Grid Net (DGN) algorithm predicts the driving context by analyzing local OGs. The main advantage in using OGs, as opposed to image-based context understanding, is that the search space is highly reduced since the information represented in OGs is much lower than in the case of images. In this work, the occupancy grids are built from data acquired from Lidar sensors, mounted on the front and rear sections of the ego-car.
The DGN algorithm is deployed within Elektrobit’s HAD software framework, coined EB robinos, a functional software architecture that manages the complexity of autonomous driving. DGN offers a robust real-time estimation of the driving context mapped to five classes: driving on the highway, driving in the inner-city, driving on the country roads, driving in ares with traffic jam situations and parking.
Deep neural networks (DNN) were chosen to encode the traffic scene due to their generalization capabilities. The number of configuration parameters of a DNN, also known as hyperparameters, increased together with the size and complexity of the networks. In order to overcome the manually tuning of these hyperparameters, we build on top of the authors previous work on one-shot learning using neuroevolutionary algorithms  and propose an approach for the automatic computation of hyperparameters during training.
The main contributions of this paper can be summarized as:
Introduction of the DGN architecture, encoding a learned grid-based representation of the traffic scene;
DGN’s hyperparameters tuning using GAs;
Deployment of DGN into the EB robinos autonomous driving software stack.
The rest of the paper is organized as follows: an overview of related work is given in Section II, while the DGN system is presented in Section III. A description of the training strategy, and the evaluation of DGN’s performance are given in Section IV. Finally, the conclusions are stated in Section V.
Ii Background and Motivation
Occupancy grids are widely used to map indoor spaces in autonomous navigation for self-driving agents. In , Convolutional Neural Networks (CNNs) have been trained on 2D range sensory data for the semantic labeling of places in unseen environments. Whithin this approach OGs created from Lidar scans are converted to gray images and used classify between three classes, that is, room, corridor, and doorway.
Several papers reported OGs constructed from the interaction of a robot with its surrounding environment. Recurrent Neural Networks (RNN) have been used by Ondruska  for tracking and classifying the surroundings of a robot placed in a dynamic and partially observable environment. A RNN filters the input stream composed of raw laser measurements in order to infer the objects locations together with their identity. The algorithm in  takes inspiration from Deep Tracking .
In , an environment modeled with a Bayesian filtering technique is processed through a DNN, with the purpose of obtaining a long-term driving situation prediction for intelligent vehicles. This work is based on the principles stated by Nuss in [6, 7], where raw Radar and Laser data is parsed through a fusion layer. The algorithm predicts future static and dynamic objects using a CNN trained on occupancy grids.
Although OGs are common tools in robotics, there are a few cases where DL techniques are used for real-time environment perception using OGs. In , an improved version of the Proportional Conflict Redistribution rule ’#’6 (PCR6), taking into account Zhang’s degree of intersection of focal elements , was used on Lidar data.
, OGs and DNNs have been applied to outdoor driving scene classification. A major differences with respect to our work is that the classifier in estimates only four driving classes based on OGs which are constructed by accumulating data over time. In our work, we compute OGs in real-time, during the movement of the vehicle, in order to classify five road types.
Iii-a Problem space
The DGN algorithm is mainly composed of three elements: (i) an OG fusion algorithm, (ii) a DNN used for parsing the OG in real-time and (iii) an evolutionary algorithm used for selecting the optimal DNN hyperparameters set. The outcome obtained from DGN is a driving context classification, mapped to five classes: inner city (IC), country road (CR), parking lot (PL), highway (HW) and traffic jam (TJ).
The OG training dataset is used to calculate the optimal DGN hypothesis , which encodes the deep network’s structure and weights. We define our problem space within the following Bayesian framework:
is the prior probability over, is the training data probability, and is the likelihood of given . is the data likelihood over a given hypothesis. The maximum a posteriori (MAP) hypothesis
, using Bayes theorem, can be defined as:
Assuming that all hypotheses are equally probable, we can choose a Maximum Likelihood (ML) approach for training:
The training samples are considered to be independently identically distributed, thus satisfying the following statement:
Maximizing the Eq. 4 is equivalent with the maximization of the logarithmic function , where the term depends on , but not on and it can be ignored:
Iii-B Occupancy Grids
Occupancy Grids are the data source for calculating the optimal DGN hypothesis . In our work, the grids used for driving context classification where built using the Dempster-Shafer (DS) theory .
From the different fusion rules proposed in literature , the DS rule was most suited for our work. The issue which arises here is how to combine two independent sets of probability mass assignments with specific situations. The joint mass is calculated from the and sets of masses. The DS combination is defined by taking for all :
where measures the amount of conflict between the two mass sets, and is a normalization constant.
The idea behind OGs is the environment’s division into 2D cells, each cell representing the probability, of occupation. Each cell is color-coded, green pixels representing the free space, red marking the occupied cells (or obstacles) and black models the unknown occupancy. The color intensity represents the degree of occupancy. The grid content is updated over and over again, in real-time, with each sensory measurement. Examples of labelled OGs are shown in Fig. 2.
Iii-C Neuroevolutionary Training of DGN
Genetic Algorithms (GAs)  are a metaheuristic optimization method, belonging to a broader class of evolutionary algorithms. The evolutionary training process starts from an initial set of solutions, or population , where every solution is given by a set of properties, called genes. A solution is also called an individual .
GAs are used in our work for finding the optimal hyperparameters set encoding
, that is, the optimal number of neurons in each layer, most suitable optimizer and the best cost function for backpropagation. This allows us to determine the smallest DNN structure, which can deliver accurate results, as well as real-time processing capabilities. An function has been defined to find the optimal set of parameters:
The proposed training method optimizes over a hyperparameters solutions space, aiming at calculating the top individuals based on their fitness value:
The optimal structure of the network is evaluated within the training loop for a given set of weights , DGN individual and training step . The weights are calculated using classical backpropagation, according to the maximum likelihood estimation defined in Eq. 5:
Once the training in Eq. 9 is completed, the hyperparameters are evaluated based on using the fitness function. The new set of hyperparameters are calculated by exploring the solution space with the help of the
procedure. The training loop stops after 15 training epochs and returns the top 5 individuals, which have the highest fitness value, approach presented within the Algorithm1 pseudocode.
Iii-D DGN Architecture
The OGs computed with the above-described method represents the input to a CNN, which constructs a grid representation of the driving environment. The neural network topology is written in Keras, on top of the TensorFlow library.
The DGN architecture has been developed for deployment within EB robinos, where smaller activation maps are required in order to achieve real-time performance. The DGN’s topology consists of two convolutional layers with and kernel filters, respectively. The convolutional kernel has been reduced to a , respectively size for the first two network’s layers. Rectified Linear Unit
(ReLu) filters each convolution, followed by a normalization layer and a pooling operation. The network also contains three fully connected (FC) layers linked to a final Softmax activation function which calculates the driving context probabilities. In order to reduce the model overfitting, Dropout layers were added.
Iv Experimental results
Iv-a Training strategy
The data was collected on several road types in Germany, using a test vehicle equipped with a front camera (Continental MFC430), a front and rear lidar (Quanergy M8).
The sensory data streams are fused into OGs having a size of and a resolution of per cell. The data samples are saved during driving at time intervals ranging between 50 ms and 90 ms per cycle. Approximately 60.000 samples were obtained, containing different scenarios types: country roads, highways, inner city, parking lots, or traffic jam situations. From the total amount of samples, 65% were used for training, 20% for validation and 15% for testing.
The classification model was trained from scratch, using a learning rate of 0.0001 for the backpropagation algorithm.
Our NN structure was determined based on the algorithm described in the Section III-C. The hyperparameters set used during training consists in optimizers
(rmsprop, adam, SGD, adagrad, adadelta, adamax, nadam),loss functions (categorical crossentropy, mean squared error) and number of neurons (16, 32, 64, 128, 258)
The fitness function evolution can be seen in Fig. 3. The GA evolved the DGN’s hyperparameters with a generation restraint value of , each generation consisting of individuals. An individual represents a DGN neural network, with the hyperparameters selected by the neuroevolutionary algorithm.
An average classification accuracy is measured after each generation. When the last generation is reached, the individual with the best score is selected as . With our training method, we have reached a value of fitness accuracy. The top network structure contains 64 and 32 neurons for the first and second FC layers, respectively, and uses categorical crossentropy as loss function and adam as optimizer.
Iv-B Accuracy Evaluation
The classification performance is summarized in the confusion matrix from TableI, where slight differences in the per-class performance are visible. The classes traffic jam and parking lot present a higher detection accuracy since its respective occupancy grids have a more distinctive structure.
A comparison of DGN’s accuracy against state-of-the-art methods is presented in Table II. The competitors are several network topologies, like AlexNet , or GoogleLeNet , as well as the algorithm from . All algorithms were tested with respect to the same testing data. The classification results obtained with DGN are clearly higher than the ones delivered by the competing estimators.
Apart from its high classification accuracy, one other advantage of DGN is represented by the detection speed of the algorithm, making it suitable for real-time applications, like the EB robinos HAD platform. DGN runs on single OG sample, without the need to accumulate grid data over time, as required by the method in .
In this paper, we have introduced DGN, which is a solution for driving context understanding, required by behavior arbitration components within HAD systems. It has been designed to infer the driving context directly from OGs, as opposed to traditional image based methods. We were able to show that a simplified CNN topology is sufficient to classify in real-time between different types of OGs, without the need of training large networks, such as AlexNet, or GoogLeNet.
-  S. Grigorescu, “Generative One-Shot Learning (GOL): A Semi-Parametric Approach for One-Shot Learning in Autonomous Vision,” in Int. Conf. on Robotics and Automation ICRA 2018, Brisbane, Australia, 21-25 May 2018, (to be published).
-  R. Goeddel and E. Olson, “Learning semantic place labels from occupancy grids using CNNs,” in Intelligent Robots and Systems (IROS), IEEE/RSJ International Conference, 2016.
-  P. Ondruska, J. Dequaire, D. Z. Wang, and I. Posner, “End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Network,” in Robotics: Science and Systems, Workshop on Limits and Potentials of Deep Learning in Robotics, 2016.
P. Ondruska and I. Posner, “Deep tracking: Seeing beyond seeing using
recurrent neural networks,” in
The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, February 2016.
-  S. Hoermann, M. Bach, and K. Dietmayer, “Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling,” in arXiv:1705.08781, May 2017.
-  D. Nuss, S. Reuter, and M. Thom, “A Random Finite Set Approach for Dynamic Occupancy Grid Maps with Real-Time Application,” in arXiv:1605.02406, May 2016.
-  D. Nuss, T. Yuan, and G. Krehl, “Fusion of laser and radar sensor data with a sequential monte carlo bayesian occupancy filter,” Intelligent Vehicles Symposium (IV), pp. 1074–1081, 2015.
-  J. Dezert, J. Moras, and B. Pannetier, “Environment perception using grid occupancy estimation with belief functions,” in Information Fusion (Fusion), International Conference, September 2015.
-  F. Smarandache and J. Dezert, “Modified PCR Rules of Combination with Degrees of Intersections,” in Proc. of Fusion, USA, July 2015.
-  C. Seeger, A. Müller, and L. Schwarz, “Towards Road Type Classification with Occupancy Grids,” in Intelligent Vehicles Symposium - Workshop: DeepDriving - Learning Representations for Intelligent Vehicles, IEEE, Gothenburg, Sweden, July 2016.
-  G. Shafer, “A Mathematical Theory of Evidence,” Princeton: Princeton University Press, 1976.
-  F. Smarandache and J. Dezert, “Advances and applications of DSmT for information fusion (Collected works),” American Research Press, USA, vol. 1-4, 2004 - 2015.
A. E. Eiben and J. E. Smith,
Introduction to Evolutionary Computing, 2nd ed. Springer Publishing Company, Incorporated, 2015.
-  M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, C. Fernando, and K. Kavukcuoglu, “Population based training of neural networks,” CoRR, vol. abs/1711.09846, 2017.
-  F. Chollet et al., “Keras,” https://keras.io, 2015.
A. Krizhevsky, I. Sutskevert, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” inAdvances in Neural Information Processing Systems 25 (NIPS), 2016.
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.