As vehicles advance with advanced safety features and eventually self-driving capabilities, massive amounts of data are generated by a variety of sensors on board, such as camera, radar and lidar as well as proximity and temperature sensors [ML_vnet_GYLi]. For instance, an autonomous vehicle is expected to generate about one gigabyte of data per second. However, currently, these data are not systematically processed, stored, or analyzed for better inference. Recently, machine learning (ML) algorithms have been developed to learn from sensor measurements due to several advantages, including low computational complexity when solving optimization-based or combinatorial search problems and the ability to extrapolate new features from a limited set of features contained in a training set.
The current trend in the usage of ML in vehicular networks focuses on centralized algorithms, where a powerful learning algorithm, often a neural network (NN), is trained on the massive dataset collected from the edge devices on the vehicles, as illustrated in Figure1. NN model provides a non-linear mapping between the input, which contains mostly vehicle sensor data, and the output, which can be the labels of the sensor data. This mapping is constructed by training the NN through the collection of the local sensor data from the edge devices. Once model training is completed, the model parameters are sent back to the edge devices for prediction purposes. However, the size of the generated data is huge and the data transmission from the edge devices to the cloud center in a reliable manner may be too costly in terms of bandwidth, introduce unacceptable delays, and infringe user privacy.
Federated learning (FL) has been recently introduced with the goal of bringing ML down to the edge level, as illustrated in Figure 2. In FL, instead of the local datasets, the edge devices only send the gradients of the learnable parameters derived from these local datasets to the cloud server. The cloud server aggregates these gradients and determine the model parameters, which are then transmitted to the edge devices. This procedure continues iteratively, until the learning model is trained. The training procedure is similar to that of ML, except that FL does not involve the transmission of the whole dataset. This enables reducing both the complexity of ever growing datasets at the edge devices in the vehicles and the overhead of the transmission of these datasets to the cloud servers.
This article aims to provide a comprehensive grasp on how vehicular networks can benefit from FL. In the following sections, we initially discuss the vehicular network applications in the context of ML. Then, we present the advantages of the usage of FL over ML in these vehicular network applications. We continue by examining the performance comparison of ML and FL in a case study: 3D vehicular object detection. To the best of our knowledge, this is the first work comparing ML and FL for the object detection problem in vehicular networks based on the real data. Finally, we provide an extensive discussion on the major research issues and future research directions in making FL feasible in vehicular networks.
Ii Machine Learning For Vehicular Networks
ML-based techniques have become significantly useful in vehicular networks with the increase in the amount as well as diversity of the data generated by the sensors. Here, we first discuss how ML training works, then enlist some major applications of ML in vehicular networks.
Ii-a Machine Learning
ML originates from the imitation of the human brain containing billions of neurons forming a neural network, hence, is mostly called as artificial neural network (ANN). There are mainly two types of ML, namely, supervised and unsupervised. In a supervised learning model, the ANN learns on a labeled dataset where an answer key is provided beforehand. In contrast, unsupervised learning studies the clustering of the unlabeled data by exploiting the hidden features/patterns derived from the dataset. The main focus in vehicular network applications is on supervised approaches since they mostly include labeled datasets, e.g., an image input can be represented by the label of objects within the image.
ML models the non-linear relationship between the input data and the output label as , where is the size of the dataset and denotes the learnable model parameters. To learn the ANN model parameters , the ML network is trained with a labeled dataset , by minimizing an empirical cost , which is averaged over each instance of the training dataset, . The minimization of the cost can be achieved iteratively by updating the ANN parameters at iteration as by computing the gradient for the learning rate .
Ii-B ML Applications in Vehicular Networks
In vehicular networks, ML has been mostly used in the applications of autonomous driving [ML_autonomous, RL_autonomous], road safety prediction [ML_safety_object_detection, ML_for_VANET_applications_survey] and vehicular object detection [ML_vehicleDetection, lyft2019].
1) Autonomous Driving: In autonomous driving, the main challenge is to avoid vehicle collisions by adjusting the driving dynamics through the processing of huge amount of data collected from several vehicles. In [ML_autonomous]
, a human-like decision-making method is proposed for autonomous driving by using convolutional neural networks (CNNs). The input of CNN is the lidar data collected from multiple vehicles to provide depth information, whereas the CNN output is the decision regarding the speed and steering of the vehicle. This CNN architecture requires the collection of a large amount of data from the vehicles for accurate prediction. Therefore, the training is usually conducted in a cloud data server in an offline manner. The main drawback in offline training is that the trained NN cannot adapt to the environment, which includes several dynamics for autonomous driving.
Reinforcement learning (RL) techniques are proposed with the goal of providing adaptivity to the ML based architecture [RL_autonomous], based on the award and penalty mechanism, which are formulated as a function of varying environment characteristics. This enables RL to perform better than conventional ML models. However, it takes longer to train an RL model. In addition, the design of the decision-making scheme of RL is particularly difficult for autonomous driving scenario, which involves several constraints, such as the safety time and distance constraints for collision-avoidance.
2) Road Safety Prediction: ML is used for the prediction of the road condition and the traffic flow in safety applications based on the data collected from the GPS (Global Positioning System) of vehicles and traffic cameras. In [ML_safety_object_detection], the authors propose a real-time road safety prediction approach based on ML and data mining. Specifically, the ML model is fed with the vehicle GPS data at the input and it predicts the road safety index at the output based on the external environment factors, such as road geometry, traffic flow and weather. After offline training, the trained model is deployed for real-time road safety prediction. The main challenge in this application is that the environmental dynamics change continuously and the ML model may fail to provide reliable performance in an adaptive manner to these changes, meeting the requirements of the intelligent transportation systems (ITS).
3) Vehicular Object Detection: As a subset of autonomous driving application, object detection concerns the detection and classification of the objects/vehicles in the vicinity of the ego vehicle, i.e., the vehicle controlled by the automated driving system, based on the sensor data collected from multiple vehicles. In [ML_vehicleDetection]
, the authors propose a support vector machine (SVM) approach by casting an optimization problem constructing a non-linear relationship between the input data, i.e., the camera images, and the output data, which are the class labels of the vehicles in the image. However, camera-only data are not sufficient to fully represent the features in vehicle surrounding for object detection. In order to provide reliable performance, the training data should be incorporated with additional sensor data such as radar and lidar data to provide depth information for the objects/vehicles.
Iii How Does Federated Learning Work?
Before introducing FL for vehicular networks, let us discuss the basic idea behind FL. The proposition of FL is based on the “mini-batch learning” technique, where the dataset is partitioned into sub-blocks and each sub-block is used for parameter update sequentially. Similarly, when the size of the training dataset in ML is large, calculating the gradient for the whole dataset becomes computationally prohibitive. Hence, the dataset is partitioned into small blocks, i.e., mini-batches as , where denotes the -th sub-block and is the number of sub-blocks. Next, the gradients are computed for each mini-batch as , based on the dataset . Then, the gradients are averaged and finally, the parameter update rule becomes .
Since the computation of the gradients is not necessarily required to be performed at the same platform or processor for different data blocks, FL schemes can exploit the local processing capabilities of the edge devices in the vehicles, as illustrated in Figure 2. The edge devices compute the gradients by using the local dataset only, and then, feed the gradient information to the server. Once the gradients are collected from all the devices, the cloud parameter server updates the parameters of the FL model, and then, shares these parameters with the edge devices. This procedure is followed iteratively until convergence.
Iv Federated Learning For Vehicular Networks
As a new emerging field, the research on FL is still in its infancy. Hence, there are very limited number of works for FL in vehicular networks [FL_vanet_Bennis, FL_vnet1_edgeComp]. In [FL_vanet_Bennis], FL has been considered in a vehicular network, where the communication between the data center and the edge devices is assisted by road side units to ensure low latency for gradient data transmission. Specifically, a Lyapunov optimization based approach is proposed to minimize the delays incurred by the transmission of gradient data in FL. However, model training and comparison of FL over ML is not considered in [FL_vanet_Bennis]. In [FL_vnet1_edgeComp], authors propose a selective model aggregation approach, where the data center collects the gradient data from only “fine” edge clients in a vehicular network, such as the devices with high data quality and power capability. Then, the performance of FL is compared to that of ML based on both MNIST and BelgiumTSC image classification datasets, which are composed of the images of the numbers and traffic signs, respectively. However, these datasets do not fully represent a realistic scenario for the detection and classification of object/vehicles. Therefore, evaluating the performance of FL in a dataset tailored for a realistic vehicular scenario is of great importance. In [lyft2019], an object/vehicle detection and classification dataset is introduced for autonomous vehicle applications, by utilizing the data from the cameras and the lidars mounted on the vehicle. In the following section, we evaluate the performance of FL over ML for a realistic object/vehicle detection problem in this dataset.
V A Case Study: FL for 3D Object Detection in Vehicular Networks
We consider 3D object detection problem in vehicular networks, based on the Lyft Level 5 AV dataset [lyft2019], collected from lidar and cameras mounted on vehicles. The dataset includes high resolution images obtained from six cameras, providing field of view, and data collected from the lidar equipments placed on the front corners of the vehicles to provide azimuth resolution. Figure 3 shows a portion of the application map, which includes over 4000 lane segments, 197 pedestrian crosswalks, 60 stop signs and 54 parking zones. First, image and lidar data are collected from the vehicles. Then, through preprocessing of the collected data, input and output of the NN are generated. In order to provide the coverage of vehicles in the range of lidar, the input data is selected as a top view image of the ego vehicle, which includes the received lidar signal strengths for different elevations, as shown in Figure 3
. The output data is the classified representation of the vehicles/objects as boxes, which is obtained by the preprocessing of the images from the cameras, as illustrated in Figure3. The training dataset is collected from vehicles in different areas after preprocessing of camera and lidar data. Each dataset includes input-output pairs. Hence, the total number of data symbols is , where the sizes of input and output data are and , respectively. The dataset has classes, i.e., , , , , , , , , , which are represented by the boxes as shown in Figure 3. We have used U-net [unet] to learn the features in the input data and achieve 3D object detection and the total number of parameters in U-net is approximately .
In ML scenario, the whole dataset of vehicles are clustered and used to train the NN. In order to implement the FL scenario, the local datasets of vehicles are first used to compute the gradient information. Then, the gradients are averaged for model parameter update.
Figure 4 provides the performance comparison of ML and FL in terms of accuracy and complexity, respectively. FL has slower convergence due to the diversity in the datasets of different vehicles. Nevertheless, both ML and FL provide satisfactory training performance after approximately iterations. On the other hand, the complexity of FL is increasing due to the transmission of model parameters whereas the complexity of ML is fixed during training due to the already collected datasets. The complexity of ML is due to the transmission of whole data symbols, i.e., approximately . In contrast, the complexity of FL is due to the two way (edge server) transmission of the gradient data size during training until convergence, i.e., for iterations. As a result, FL has approximately times lower transmission overhead as compared to ML.
Vi Research Challenges and Future Works
In this section, we provided an extensive discussion on the research challenges and future directions for FL in vehicular networks.
Vi-a System Heterogeneity
System heterogeneity occurs when the datasets of a diversity of edge devices are used in model training, as shown in Figure 5. Since diversity includes different type of devices, untrusted devices can join the network more easily, which brings security and privacy issues. The reliability and trust for the devices in the network can be achieved through the use of a reputation management (reward and punishment) based approaches. In [FL_vnet1_edgeComp], authors propose a method, where each edge device receives a reward in exchange for their computation of power and data quality. However, [FL_vnet1_edgeComp] considers a simple FL framework with a single server. In a realistic vehicular network scenario, there can be multiple access points acting like servers in FL, increasing the dimension of the reputation management problem. As a result, their usage in vehicular networks requires further research for multi-server FL architectures.
Vi-B Data Heterogeneity
Data heterogeneity occurs due to the non-uniform distribution of the datasets at the edge devices, as shown in Figure 6
. For example, in autonomous driving scenario, the image data obtained from vehicles in different locations have different distributions. Data heterogeneity causes large variance in the averaged gradient data, and therefore, increases the convergence rate for the learning models. One possible solution is to increase the model size, i.e., enlarging the width and the depth of the NN model, as demonstrated for the beamformer design problem in[elbir2020FL]. However, the usage of FL in vehicular network imposes more heterogeneity in the data compared to the previous studies, hence, needs to be studied further for designing larger and deeper NN models to provide robustness against the data heterogeneity.
Vi-C Efficient Model Training
The efficiency of model training can be improved by the use of transfer learning (TL) based approaches. TL is an ML method, where a model developed for a certain task is reused as the starting point for a model on a different task. In[elbir2020TL], TL has been proposed for cognitive radar applications, where an ML model is used for different sensor selection tasks. The application of TL for vehicular networks is advantageous. For instance, instead of training a model from scratch, a well-trained model with large dataset can be used with a soft parameter update for smaller datasets. In this case, the parameter update involves lower complexity since only a small portion of the NN is trained, which leads to more efficient model training. The application of TL in vehicular networks can bring a challenge on the data similarity. Specifically, the TL accuracy strongly relies on the similarity between the newly collected data at the edge devices in the vehicular network and the training data used to train the pre-trained model. In order to obtain higher accuracy from the NN with new data, larger portion of the model should be updated. Thus, there is a trade-off between the similarity/diversity of the datasets and the training complexity. While data similarity/diversity issue has been studied for cognitive radar applications [elbir2020TL], its effect on vehicular applications has not been exploited. In addition, the diversity of the datasets can incur difficulties when performing TL due to non-uniform distribution of the dataset, which has been studied in [FL_Gunduz], accommodating a shallow ML model without the focus on TL. As a result, new approaches need to be developed to make FL model training more feasible in vehicular network applications.
Vi-D Reducing Transmission Complexity
The transmission complexity is mainly due to the sharing of the gradient information between the global server and the edge devices in the vehicular network. There are two ways of reducing the transmission complexity in FL-based framework: compressed sensing and model compression. In compressed sensing, the sparsity of the gradients, i.e., most of the gradients being zero, is exploited to reduce the amount of transmitted data [FL_Gunduz]. While this approach reduces the transmission overhead, it increases encoding/decoding complexity at the parameter server and edge devices in a vehicular network for reliable performance. Hence, further investigation of lower hardware complexity solutions is needed for reliable model transmission.
Model compression is another approach to reduce the number of NN parameters. In [elbirQuantizedCNN2019]
, quantized NN parameters are used for this purpose in sensor selection problem. In order to apply model compression techniques such as quantized/tensorized NN models in vehicular applications, the application-specific design of hyperparameter optimization stage and NN model is required to obtain the optimum model architecture.
Vi-E Online Learning
Once the FL model is trained, it can be used for prediction purposes. However, the input data at the edge devices change over time due to the dynamics of the vehicles in the network. Therefore, the FL model needs to be adapted to these changes. This issue requires the training of the FL model in an online manner to keep updated for the new incoming data [elbir2019online]
. The main challenge in online learning is the absence of the labels in the input data. Unlike the offline training where the labeled data is prepared before the model training, the edge devices needs to label the incoming data in the online scenario. As a result, this approach introduces delay in the data transmission. One possible solution for reducing this delay, is to label each data instance only when there is a significant change in the data, as demonstrated for wireless channel estimation in[elbir2019online]. However, further research is needed to design online training algorithms for vehicular networks, taking into account the dynamics of both the wireless channel and the data, specific to vehicular movement and communication.
In this article, we present an FL based framework, as an efficient learning scheme for vehicular networks and edge intelligence, in comparison to the classical ML techniques. We enlist several applications of vehicular networks for the usage of ML and FL. We illustrate the performance gain of the FL over ML for vehicular object detection application based on the real image and lidar data collected from the vehicles. Finally, we identify the major challenges in the usage of FL together with the candidate solutions as future direction of research.
The work of Sinem Coleri was supported by the Scientific and Technological Research Council of Turkey with European CHIST-ERA grant 119E350.