The past decades have witnessed rapid developments in Intelligent Transportation Systems (ITS). Connected and Autonomous Vehicles (CAV) are an integral part of ITS and will redefine mobility, change the existing vehicle usage and pave the way for future transportation services. Vehicles collect the necessary information via a number of onboard sensors. This information is later disseminated to the surrounding environment in a Vehicle-to-Everything (V2X) fashion, negotiating manoeuvres and building a more agile, safe and efficient traffic network [shladover2018connected].
Without constant human supervision, the safety of CAVs heavily relies on the knowledge acquired from the connected surrounding environment [kockelman2016implications]. However, this dependence of exchanged data brings into the surface several security risks and potential malicious cyberattack. The connected nature of the vehicles increases the risk of compromised vehicles on the road, and thus the demand for more sophisticated anomaly detection and cybersecurity protection techniques. In this work, we will focus on CAVs that maliciously exchange their falsified self-reported locations to the surrounding vehicles, presenting a novel way of detecting and counteracting on the abnormalities.
Anomaly detection is the identification of abnormal observations that do not conform to the expected behaviour. Ultimately, the goal is to present a quick and reliable alert when an anomaly occurs, helping the system to respond accordingly. Particularly for CAVs, it plays a crucial role in system malfunction detection, intelligent operations and cybersecurity protection. Growing efforts have been put into this area during the past years. For example, [alheeti2017using, ali2018intelligent]
introduced intrusion detection methods for CAVs using deep learning and discriminant analysis. Additionally,[berlin2016poster] proposed a CAV misbehaviour detection method for service management.
In this paper, we develop a deep autoencoder approach for anomaly detection in CAVs. As an unsupervised method, autoencoders are capable of finding the latent patterns of data. It is generally accepted that most of the research activities in this area are carried out using synthetic datasets, generated by means of simulation frameworks [rajbahadur2018survey]. Similarly, in our work, we will train and validate our deep autoencoder model using anomalous-free data, generated using OMNeT++ network simulator [omnetpp]. Later on, we introduce different abnormalities on our test dataset in order to evaluate the validity of our model.
The rest of this paper is organized as follows. Section II presents our problem description and our deep autoencoder approach to detect anomalies. Section III gives more insights about the generation of our synthetic dataset, the tools used for that and the details about our scenario. Section LABEL:sec:results presents the results and our analysis. Finally, Section LABEL:sec:conclusion concludes the paper summarizing our findings.
Ii-a Problem Statement
We consider a system where CAVs exchange beacons on a periodic basis. Also, we assume that anomalies are presented within the information encapsulated in the beacon frames, and more specifically at the self-reported CAV locations. The falsified information is then received by other surrounding CAVs or infrastructure network. Fig. 1 gives a brief summary of the anomaly scenario considered in this paper. The fake position reported from the “ghost” transmitter could be due to sensor malfunction in the actual transmitter , or a possible information hacking during the transmission. The detection of self-reported location anomaly could further help with the root cause analysis and decision making in CAVs.
We aim to design a self-learning process for detecting the self-reported location anomalies. To do so, the Received Signal Strength Indicator (RSSI) is chosen as a proxy for the distance separation between two CAVs. The RSSI represents the beacon’s signal strength and it is dependent on the maximum broadcasting power, the antenna gains, and the attenuation from the channel and the distance. RSSI has been widely used for indoor localization [mcconville2018understanding, byrne2018residential, mcconville2019dataset], human activity recognition [mukherjee2018rssi] and movement tracking [li2018indoor] in wireless networks. For CAVs, when a packet is received, the RSSI along with the transmitter-reported locations and the receiver self-location could form a strong state description for the self-reported location anomaly detection.
For our system, we will consider the RSSI between the different pairs of CAVs exchanging beacons in a Vehicle-to-Vehicle (V2V) fashion. As shown in Fig. 1, and are the real transmitter-receiver pair, while is a “ghost” transmitter. Locations of , , and are represented as , and , respectively. The distance between each pair of , and are , and , respectively.
Ii-B The Deep Autoencoder Approach
We design a Deep Autoencoder (DAE) for CAV self-reported location anomaly detection in an unsupervised manner. Autoencoders are a special type of artificial neural network, encoding high-dimensional data into a latent space by replicating the input in the output[kingma2013auto]. The idea for training a DAE for anomaly detection is to feed anomaly-free data into the network so it can learn the anomaly-free manifold and the corresponding latent space. Once the model has learned the anomaly-free manifold for a specific task, the error between the DAE input and output would be a strong indicator for recognizing anomalous samples.
Here we use a seven-layer, fully-connected autoencoder structure, as shown in Fig. 2. Note that the first hidden layer
has more neurons than the input layerin our model.
can be seen as a data interpolation layer, which helps the model to learn the proper latent space. Symmetrically, the output layertransfers the interpolated data into its original dimension. Hidden layers to build the standard DAE structure, with a latent space
. The activation function is discarded at the bottleneckand the output layer . Dropout is not used in our model.
As mentioned in Sec. II-A, the location of the receiver , the self-reported transmitter location (when there is no anomaly during transmission) or (when anomaly happens), along with the RSSI value form the feature set for self-reported location anomaly detection in CAVs. During training, anomaly-free samples
are fed to the DAE. The model is trained by minimizing the loss function:
Here is the output of the DAE, and represents the parameters. The goal is to train the DAE generating close to . Gradient descent optimization is used for training this model, with the learning rate of 0.00095.
To validate the performance of the trained DAE, the anomaly-free data is split into the training set and validation set with the proportion of 0.8 and 0.2, following the same data distribution. During validation, samples from the validation set are fed into the trained model. The validation loss is calculated as:
Since DAE is an unsupervised method, here we introduce the adjusted mutual information (AMI) score [vinh2010mutualinformation] to evaluate the relation between and
. It gives an unbiased evaluation of our trained model. The mean value and variance ofand are also calculated. Results are shown in Table I. Though the mean values and variances of two loss distributions are slightly different, they can be considered as nearly identical in the context of AMI.
Ii-B3 Anomaly Detection
After the DAE model is well-tuned and validated, it can be applied for anomaly detection. Potential anomalous samples go through the DAE, with an output . The difference between and can then be used for anomaly detection, while and serve as references.
Iii Data Generation
For this particular work, we assume that each vehicle generates one beacon per second. Each beacon is encapsulated in a UDP packet with a total length . Each UPD packet is broadcast in the network to the surrounding vehicles. We choose a area in central Bristol, UK as our simulation scenario. The number of vehicles within our system is constant (always vehicles). To simulate our scenario, we used OMNeT++ [omnetpp] and our modified INET framework [parallelInet]. The vehicles mobility traces were generated using SUMO traffic generator [sumo] and parsed within our framework. Our INET framework was further modified with a logging interface that logs all the packets generated, transmitted and received, in a space-separated file format. These traces will be later used for our anomaly detection algorithm.
In particular, at the transmitter (TX) side, we find at first the wireless interface ID (e.g., ScenarioWorking.node.wlan.radio), followed by the node ID (e.g., “1” for the TX example). For this work, we assumed that all the vehicles are equipped with one IEEE 802.11p transceiver, operating at the frequency band of . The next entry is the packet ID, i.e. “UDPData-50 1027”, used to reconcile the transmitted with the received packets. UDPData-50 is the data structure called Signal within INET, that represents the physical phenomena of transmitting a packet. The number following the signal is the sequence number of the event generated in INET. Start fields represent the timestamp that the UPD packet started being transmitted (in seconds), followed by the position of the vehicle in space (given in meters). SUMO, when parses a real-world map, converts all the geolocation coordinates into a Cartesian plane with the southern-west map corner being the origin of the Euclidean space. Similarly, End shows the timestamp that the transmission was over, followed by the position of the vehicle on that particular time.
The logging of each packet at the RX side follows a similar structure. Again we find the wireless interface ID followed by the node ID, the packet ID and the starting and ending timestamp of the reception as well as the positions of the RX vehicle. On top of that, our logging interface saves the RSSI of all the received packets (as it is being calculated within the INET framework). In order to calculate the RSSI for each packet, we take into account the building layout and the position of the vehicles. A scalar radio medium was chosen for our configuration, meaning that the analog signal power is represented with a scalar value over frequency and time. As a path loss model, we chose Rician Fading with a path loss exponent and a Rician K-factor equal to . Finally, the obstacle loss model chosen was the DielectricObstacleLoss. This model calculates the dielectric and reflection loss along the straight path considering the shape, the position, the orientation, and the material of obstructing physical objects. The rest of our simulation parameters can be found in Table LABEL:tab:simParameters. Also, the reasoning for choosing this setup was derived from the performance investigation in [agileCalibration].
A UDP packet is considered as deliverable under the following conditions. At first, the RSSI is compared with the sensitivity threshold for the chosen Modulation and Coding Scheme (MCS). When the RSSI is lower than , it is considered as non-deliverable. If the RSSI is above the , then it is compared with the Signal-to-Noise-plus-Interference (SNIR) threshold . When below that, the packet is always considered as non-decodable due to errors introduced from the channel. The last case is when the RSSI is greater than the . For that, a Packet Error Rate (PER) value is calculated based on the current SNIR. This value is later compared with a random number chosen from a uniform random distribution and if it is greater, the packet is considered as delivered. All the successfully received packets are logged in the dataset. These RX entries can then be reconciled with the transmitted ones using the combination of the values found at the packet ID (signal and sequence number), as mentioned above.