Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems

10/20/2021 ∙ by Jindi Zhang, et al. ∙ City University of Hong Kong University of Victoria Tsinghua University University of Puerto Rico 0

For autonomous driving, an essential task is to detect surrounding objects accurately. To this end, most existing systems use optical devices, including cameras and light detection and ranging (LiDAR) sensors, to collect environment data in real time. In recent years, many researchers have developed advanced machine learning models to detect surrounding objects. Nevertheless, the aforementioned optical devices are vulnerable to optical signal attacks, which could compromise the accuracy of object detection. To address this critical issue, we propose a framework to detect and identify sensors that are under attack. Specifically, we first develop a new technique to detect attacks on a system that consists of three sensors. Our main idea is to: 1) use data from three sensors to obtain two versions of depth maps (i.e., disparity) and 2) detect attacks by analyzing the distribution of disparity errors. In our study, we use real data sets and the state-of-the-art machine learning model to evaluate our attack detection scheme and the results confirm the effectiveness of our detection method. Based on the detection scheme, we further develop an identification model that is capable of identifying up to n-2 attacked sensors in a system with one LiDAR and n cameras. We prove the correctness of our identification scheme and conduct experiments to show the accuracy of our identification method. Finally, we investigate the overall sensitivity of our framework.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 5

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In the past few years, autonomous driving has attracted significant attention from both academia and industry. Recent advances in artificial intelligence and machine learning technologies enable accurate object and event detection and response (OEDR) 

[21]. The technology advances, together with great commercial potentials and incentives, quickly pushed the adoption of autonomous driving. For instance, Waymo launched a driverless taxi service in Arizona in 2018 [7]. Tesla announced that the full self-driving feature of its products would be available by the end of 2020 [11].

To facilitate accurate OEDR tasks, autonomous vehicles (AVs) are usually equipped with a number of sensors, including GPS, inertial measurement unit, radar, sonar, camera, light detection and ranging (LiDAR), etc. [1]. Among these sensors, optical devices (LiDAR and camera) have become more and more important because they can provide object detection in a large range and also because many emerging machine learning models proposed in the past few years can accurately measure the depth of objects and detect objects. Due to the importance of these optical devices, in this paper, we focus on their security aspects, particularly on the mitigation of potential attacks on these optical devices.

Despite the importance of the security of optical devices in an autonomous driving system, it was investigated only in a few previous studies. In [25, 18], researchers summarized several categories of vulnerabilities in autonomous vehicles. In [16, 26, 23], researchers demonstrated through experiments that LiDAR can be attacked by sending spoofed and/or delaying optical pulses. They also demonstrated that a camera can be blinded if it receives an intense light beam.

Although these pioneer studies are important, there is a lack of a comprehensive mechanism to detect and identify such attacks. In this paper, we propose a novel framework to tackle this important issue by (1) detecting the optical attacks using data from multiple sensors and (2) identifying the sensors that are under attack. To achieve accuracy in both detection and identification, there are two major challenges:

  • The optical signals can be processed by many advanced machine learning models, each of which can generate various features. Moreover, optical signal attack causes different consequences on camera images and LiDAR point clouds. Therefore, an appropriate type of feature needs to be chosen as the common ground where both attacks can be detected.

  • The size and the position of the damaged area caused by optical signal attack in images and point clouds are unpredictable. The damaged area can appear anywhere in the sensor view. Detection method must perform fine-grained detection across the whole sensor view in order to be invariant to the size and position of the damaged area and distinguish the feature differences in non-attack scenarios and attack scenarios.

To address the first challenge, the proposed framework includes an optical attack detection method that extracts depth information (i.e., disparity) from two sets of sensor data respectively and then uses depth information as the common ground to detect attacks on both images and point clouds. To address the second challenge, our method detects attacks by analyzing the distribution of disparity errors that measure pixel-level disparity inconsistencies in the whole sensor view. Thus, the detection method is robust.

The main contributions of this study can be summarized as follows:

  • We develop a new technique to detect optical attacks on a system that consists of three sensors, including two possible cases (1) one LiDAR and two cameras, or (2) three cameras. Specifically, we first use data from three sensors to obtain two versions of depth maps (i.e., disparity) and then detect attacks by analyzing the distribution of disparity errors. In our study, we use real datasets of KITTI [10, 9] and the state-of-the-art machine learning model PSMNet [4] to evaluate our attack detection scheme and the results confirm the effectiveness of our detection method.

  • Based on the detection scheme, we further develop an identification model that is capable of identifying up to attacked sensors in a system with one LiDAR and cameras. In our study, we prove the correctness of our identification scheme and conduct experiments to show the accuracy of our identification method.

  • At last, we investigate the sensitivity of our framework to optical attacks with more diverse settings. We use experiments to show its excellence in this aspect.

The rest of this paper is organized as follows. In Section II, we first introduce the studies related to our work. In Section III, we discuss the system models, including the optical sensors and attack models, and our attack mitigation framework. In Section IV, we elaborate on the attack detection schemes. In Section V, we further investigate the attack identification issue. In Section VI, we examine the overall sensitivity of our framework. Finally, we conclude the paper in Section VII.

Ii Related Work

The methods for attacking optical sensors (LiDAR and camera) gradually become more and more advanced. In surveys of studies [25, 18], researchers introduced the vulnerability that perceptual sensors of AVs could be compromised via physical channels at a close distance. In [16], the authors showed several effective and realistic methods to compromise a 2D LiDAR and a camera. Particularly, in their experiments, they managed to relay and spoof LiDAR signals, as well as blind the camera using strong light beams. Adversary attacks against camera with intensive lights were also studied in [26] and even caused irrecoverable damages to the camera. Later, Shin et al. demonstrated the attacks against Velodyne VLP-16, one of the most popular top-sale 3D LiDARs in the market, by producing fake signals [23]. Based on the previous work, the researchers dug even deeper in [3], in which the authors designed an optimization-based strategy to produce more bogus dots to compromise a 3D LiDAR with a much higher success rate, and they constructed new attacking scenarios to study the impact on the decision making of AVs. Despite their importance, existing studies in optical attacks only give some rough countermeasure ideas, such as redundancy of sensors and randomization of LiDAR pulse.

In the literature, there are only a few studies for systematically defending optical sensors of AVs, but these studies focus on other types of sensor attack. For example, the authors of [20] claimed that the attacks against a camera could be wise enough to erase only objects from pictures or modify their positions. By using an additional LiDAR as a reference, they proposed to extract object features from images and LiDAR point cloud, and then detect the attacks via mismatches of the two sets of features. In [5], Changalvala et al. investigated an internal attack that can tamper a point cloud from the inside of an AV system, and they tackled the detection problem by adding a watermark to the LiDAR point cloud. Different from [20] and [5], our work targets at defending against the optical attacks on LiDAR and cameras of AVs. We follow the idea of redundancy of sensors and design a countermeasure framework that not only can detect optical attacks via analyzing the inconsistency of depth information (i.e., disparity) from different sources, but also can identify the compromised sensors of an AV system.

As for estimating depth using images, there are two main categories of algorithms: monocular-vision based and stereo-vision based. The current methods of the two categories all adopt deep neural networks, but the monocular ones consider the task as a dense matrix regression problem and focus on minimizing the error of predictions, while the stereo-vision based algorithms formulate it as a problem of matching pixels from two images 

[2]. As a result, DORN [8], one of the best monocular methods, can only predict depth with an error of around . In contrast, as a representative algorithm of the latter category, the state-of-the-art PSMNet [4] achieves the task with an error that is less than . In this study, we choose PSMNet over other methods because of its better accuracy.

Iii System Models

In this section, we first explain the main optical sensors in an autonomous driving system and their normal operations. We then elaborate on the attack models on LiDAR and camera, with some numerical results that illustrate the impacts of optical attacks. Finally, we briefly explain the main idea of the proposed framework for attack detection and identification.

Iii-a Optical Sensors

In this paper, we consider a general autonomous driving system, and we focus on the optical devices, in particular, LiDAR and camera.

Iii-A1 LiDAR

A LiDAR sensor can send and receive specific optical pulses in certain directions. By comparing the incoming reflected signals with the transmitted ones, LiDAR can provide an accurate estimation of the distance between the LiDAR and an object in a specific direction. The output of LiDAR consists of a set of points in 3D space, which is known as a point cloud. By clustering these points, the object detection models applied in AV systems can locate obstacles in the real world.

Iii-A2 Camera

Cameras are very common in existing autonomous driving systems. AVs are usually equipped with more than two of them. The produced images are useful to several perception functions, such as obstacle detection and road/lane detection.

Specifically, similar to human eyes, two cameras can be used to form a stereo vision system that can estimate the depth of an object. As a simple example, if a real-world point is captured at pixel in the left image and at pixel in the right image, then the disparity is defined as . We can calculate the depth using

(1)

where is the focal length and is the distance between the two cameras.

In general, we can obtain the depth of a real-world point using disparity, once the pair of corresponding pixels ( and ) are located in two images. Therefore, the main goal of depth estimation algorithms, such as PSMNet [4], is to identify pairs of pixels in two images that are corresponding to the same real-world points. Finally, a disparity map is generated by computing disparity for every pixel in an image.

Fig. 1: An example of a compromised point cloud that contains bogus signals in a region.
Fig. 2: An example of a compromised camera image that contains a round Gaussian facula.

Iii-B Attacks on Optical Sensors

Iii-B1 Attacks against LiDAR

In [16] and [23], the authors demonstrated several methods to attack LiDARs. The main idea in these attacks is to generate or relay the legitimate optical pulses so as to mislead the perception module in the system model.

Although the existing LiDAR spoofing attacks can only generate a limited number of fake points, we believe that a more powerful attacker can generate a larger number of spoofed points in the point cloud with several advanced attacking sources. Therefore, in this paper, we produce the compromised point clouds by generating spoofed signals for a region, so that the perception module of AVs may detect a fake object, as shown in Fig. 1.

Iii-B2 Attacks against Camera

The attacks against camera have been studied in [16] and [26]. The main idea in these studies is to generate strong light signals so as to blind the cameras. According to [16], to blind a camera, the power of light source must increase exponentially with the growth of the distance between the light source and the camera. The effectiveness of the attacks is also affected by the environment light conditions. Therefore, when LED is used, in order to form effective attacks, the distance between the light source and the camera must be within a few meters, and the attacks must be conducted in dark environments, which is less practical. By comparison, attacks using lasers seem to be more realistic.

In our study, we believe that the attackers do not need to completely blind the camera. Instead, their main objective is to mislead the perception module in the autonomous driving system. To this end, we consider that the attacking light source is a laser and the distance between the attacking source and the cameras can be sufficiently large. As a result, the attacks from a laser result in a contaminated area with certain size at a random position in images. Therefore, we generate the affected camera data by overlaying a Gaussian facula on them, as shown in Fig. 2. The affected images we generate are equivalent to the results in [16] and [26].

Iii-C Impact of the Optical Attacks

Fig. 3: In AV systems, the perception module processes the raw sensor data and generates the environmental high-level information for driving decision making. Then, the driving commands are sent to control units.

To understand the impacts of the aforementioned optical attacks, we conduct extensive experiments testing the object detection algorithms for AVs with the compromised sensor data. Next, we briefly introduce the common experimental setup in this paper, which is also used in the experiments of other sections.

Iii-C1 Common Experimental Setup

In this paper, we use two datasets. The first one is the KITTI raw dataset [9], which includes data of one LiDAR and four cameras in different environmental conditions for autonomous driving, such as City, Residential, Road, Campus, etc. We customize it by selecting 1000 sets of sensor data.

The second dataset we use is the one provided in the KITTI object detection benchmark [10], which contains sensor data of one LiDAR and two cameras. We divide the labeled part of the dataset into a training set and a test set according to [6]. The two sets include 3712 sets of sensor data and 3769 sets of sensor data, respectively.

To produce the compromised LiDAR data, we generate a bogus signal with a height of meters and a width of meters, which is equal to the width of typical highway lanes, at a random distance of to meters away from the LiDAR sensor in point clouds.

To generate the compromised camera images, we overlay a round Gaussian facula with a random radius of pixels to pixels on images that have a size of pixels by pixels.

Iii-C2 Attack Experiments and Results

Here, we first conduct experiments on the customized KITTI raw dataset [9]. To evaluate the impact of the optical attacks on LiDAR, we use a pre-trained model based on PIXOR [27], which is a 3D object detection method using LiDAR data. In our experiments, we generate a compromised point cloud for each one of the 1000 sets of sensor data and feed it to the PIXOR model. We observe that the model falsely considers the bogus signals as obstacles in 986 cases out of 1000.

To measure the impact of the attacks on camera, we use the standard performance metric, average precision (AP), where the prediction is considered accurate if and only if the Intersection over Union (IoU) is larger than

. In addition, we use a pre-trained model provided in TensorFlow API 

[13] that can detect vehicles from images. Specifically, the model is based on Faster R-CNN [19] with a ResNet-101 architecture [12]. In our experiments, we produce a compromised image for each set of sensor data in the dataset and feed it to the Faster R-CNN model. The numerical results show that the AP for detecting cars is when there are no attacks. By comparison, the AP drops dramatically to when there are optical attacks against the camera.

To briefly summarize, we observe that the aforementioned attacks on optical devices can significantly compromise the accuracy of object detection, which is a fundamental task of perception module in autonomous driving. As shown in Fig. 3, the results of environment perception are passed to the driving decision module that directly sends commands to the vehicle control units, such as the engine and brake. Therefore, we believe that optical attacks are extremely hazardous because it is highly possible that an inaccurate perception due to optical attacks can lead to wrong driving decisions and can cause catastrophic outcomes.

Iii-D A Mitigation Framework Against Optical Attacks

To defend against such attacks, in this paper, we propose a framework to mitigate optical attacks. The main idea of our framework is to detect optical attacks and then identify the affected sensors. In this manner, the autonomous driving system can choose to use signals from sensors that are not under attack to perform accurate perception.

Specifically, our framework consists of two main procedures. The first procedure is for attack detection. To this end, we consider a system that consists of three sensors in two scenarios, (1) one LiDAR and two cameras, and (2) three cameras. In both cases, we use data from three sensors to obtain two versions of disparity maps and then detect attacks by analyzing the distribution of disparity errors. Based on the first procedure, we design the second procedure to identify up to affected sensors in a system that consists of one LiDAR and cameras. In Section IV and Section V, we introduce the two procedures in more detail.

Fig. 4: The detection method is designed for two three-sensor systems. For Scenario 1, the detection method structure involves Block A and Block C. For Scenario 2, the detection method structure involves Block B and Block C.

Iv Attack Detection

In this section, we first explain why we target a system that consists of three sensors and then make a hypothesis about the feasibility of the detection task in a three-sensor system. Next, we focus on the disparity error and how to detect attacks by analyzing the disparity error distribution. Specifically, we elaborate on the calculation of disparity error for two main scenarios of the three-sensor system and conduct extensive experiments on the real dataset to prove the hypothesis and show the accuracy of our method.

Iv-a Three-Sensor Systems and a Hypothesis

For attack detection, we aim to detect attacks with sufficient accuracy using the smallest number of sensors. Due to the trade-off between cost and performance of object detection, usually, there is one primary LiDAR mounted on the roof of an autonomous vehicle which is also equipped with multiple cameras [1]. To obtain two versions of a depth map from an AV system like this, we at least need three sensors: one LiDAR and two cameras, or three cameras.

For the first case, we notice that LiDAR can produce accurate depth maps in point clouds. On the other hand, stereo-vision based depth estimation algorithms can also generate depth maps out of stereo images. Intuitively, if we compare a depth map produced by LiDAR and another generated by two stereo images, we may be able to detect the distortion of depth information caused by optical attacks on such a three-sensor system. Consequently, the first three-sensor system that we consider consists of one LiDAR and two cameras.

For the second case, it is obvious that we can use the image taken by one camera as the reference, and then use images taken by two other cameras to produce two depth maps using a depth estimation model. By comparing the two depth maps, we may be able to detect attacks on the second three-sensor system that consists of three cameras.

To briefly summarize, in this paper, we consider two three-sensor systems that are practical in autonomous driving systems. Furthermore, we make a hypothesis that, with appropriate design, we can accurately detect the optical attacks on both of the three-sensor systems. In the following discussions, we verify this hypothesis by elaborating on the mechanisms to detect attacks on the two three-sensor systems, respectively.

Iv-B Scenario 1: One LiDAR and Two Cameras

For this scenario, we denote the LiDAR as sensor , and denote two cameras from the right to the left as and , respectively. Accordingly, the data generated by the sensors are denoted as , , and . The detection system we design for this scenario is shown as the combination of Block A and Block C in Fig. 4.

In our system illustrated in Fig. 4 (Block A & Block C), we designate camera as the reference camera to generate two disparity maps. Specifically, we set the image taken by the reference camera (i.e., ) as the reference image, and then feed it with the image taken by the other camera (i.e., ) to a depth estimation model to produce the first disparity map, denoted as , in which we include the disparity information at each pixel of the reference image.

Here, we note that many algorithms have been developed in recent years that can generate accurate disparity maps with camera images. For instance, PSMNet [4] and DORN [8] are two recent algorithms based on deep learning. In this paper, we use the former one to produce disparity maps, since the PSMNet model gives more accurate results over others.

In addition to , we also project the depth information (i.e., point cloud ) obtained by LiDAR onto the reference image to generate the second disparity map, denoted as . In this procedure, the depth information in the point cloud is converted to disparities by using Eqn. (1). To generate all disparity maps in the same scale, in the equation is set to be the same as the focal length of the cameras, and is set to be equal to the baseline of the stereo vision formed by and . Then, disparities are projected onto .

(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 5: Distributions of disparity error in normal case (cyan bars) and in attack cases (red bars) for Scenario 1. (a) No attack vs. attacked; (b) No attack vs. attacked; (c) No attack vs. attacked; (d) No attack vs. attacked; (e) No attack vs. attacked; (f) No attack vs. attacked; (g) No attack vs. attacked.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 6: Detection rate varies with the designated false alarm rate in each attack case for Scenario 1. (a)  is attacked; (b)  is attacked; (c)  is attacked; (d)  and are attacked; (e)  and are attacked; (f)  and are attacked; (g) , and are attacked.

To detect optical attacks, we compare the two disparity maps: and . Since in the two procedures described above we use as the reference image, the two produced disparity maps have the same scale and share the same view. Thus, we can compare them directly. Here, it shall be noted that contains sparse disparity information, since the distances in the point cloud are not densely measured. Therefore, in this comparison procedure, we only compare pixels that have valid disparity in . For valid pixels, we take the KITTI stereo benchmark [15] as a reference and design our own standard, in which a disparity inconsistency for pixel is counted if and only if

(2)

Based on the pixel-level disparity inconsistencies, we evaluate the disparity error, denoted as , between and . In particular, the disparity error is defined as the ratio of the quantity of pixel-level disparity inconsistencies over the total number of valid pixels.

Finally, to detect an optical attack, we need to evaluate the ranges of disparity errors in normal cases and attack cases. We believe that the two ranges are distinguishable and we can then use a threshold to determine whether there is an optical attack. In particular, we determine that one of the three sensors is under attack if and only if .

The threshold is determined offline based on the value distribution of when the three-sensor system is in a safe environment, since only the correct data of optical sensors are available on an autonomous vehicle in normal conditions. According to the statistical law, we use a large number of samples of the disparity error to represent its real distribution in normal cases and define a designated false alarm rate to arbitrarily set

of them as virtual outliers, where

. Then, the critical value separating inliers from outliers is :

(3)

In this manner, the threshold, which only moves within the value range of disparity error samples, is determined by the value of . Intuitively, to obtain the best detection performance, we should maximize the detection rate and minimize the designated false alarm rate. Hence, we show how the detection rate varies when adjusting the threshold via .

Iv-C Experiments for Scenario 1

Method Granularity Avg. Detection Rate
Scenario 1 Scenario 2
Ours () Pixel-level
Ours () Pixel-level
Ours () Pixel-level
Ours () Pixel-level
Ours () Pixel-level
Baseline (IoU ) Object-level
Baseline (IoU ) Object-level
TABLE I: Detection rate comparison between the method in our framework and the baseline

Iv-C1 Setup

To validate the hypothesis for this scenario, we conduct extensive experiments. We consider all possible attack cases where any sensor or any combination of the three sensors gets attacked. We use the data of one LiDAR and two cameras from the customized KITTI raw dataset [9] to produce affected sensor data for each attack case. The production scheme is described in Section III-C1. The PSMNet model used in the experiments is provided by Wang et al. [24], which is trained on Scene Flow dataset [14] and KITTI object detection dataset [10]. As for metrics, we measure the disparity error distribution and the rate of correct detection for each attack case.

In the literature, there is no existing solution for optical attack detection. Therefore, to compare our scheme with possible solutions, we implement a possible baseline solution to optical attack detection that first extracts object-level features from the data of two individual sensors respectively, and then detects attacks by measuring the mismatches between the two sets of features. Specifically for Scenario 1, we implement the baseline with the backbone of PIXOR [27] for extracting object-level features from point clouds and the backbone of Faster R-CNN [19] for extracting from images. We set IoU to and for determining feature mismatches.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 7: Distributions of disparity error in normal case (cyan bars) and in attack cases (red bars) for Scenario 2. (a) No attack vs. attacked; (b) No attack vs. attacked; (c) No attack vs. attacked; (d) No attack vs. attacked; (e) No attack vs. attacked; (f) No attack vs. attacked; (g) No attack vs. attacked.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 8: Detection rate varies with the designated false alarm rate in each attack case for Scenario 2. (a)  is attacked; (b)  is attacked; (c)  is attacked; (d)  and are attacked; (e)  and are attacked; (f)  and are attacked; (g) , and are attacked.

Iv-C2 Results

The experimental results are shown in Fig. 5 and Fig. 6. In Fig. 5, we compare the distributions of disparity errors between the normal case (i.e., no attack, in cyan bars) with each one of the seven possible attack cases (red bars). We first observe that the disparity errors are smaller than in most normal cases. By comparison, the disparity errors in most attack scenarios are larger than . These results indicate that our detection scheme is sensitive enough, so that there is almost no overlap between the distribution in the normal case and the distributions in those attack cases.

In Fig. 6, we adjust the threshold used to declare attacks by varying the designated false alarm rate , and evaluate the attack detection rate versus . As we can see from the figures, among seven attack scenarios, the performance is perfect in five cases, where the detection rate hits for all possible values of . And even in the non-perfect cases (i.e., (b) and (f)), the proposed detection system can obtain more than detection rate with less than false alarm. Such results confirm our hypothesis for such a three-sensor system.

We also show the detection rate comparison between our proposed detection method () and the baseline for Scenario 1 in Table I, where we can observe that our method outperforms the baseline by about with IoU and about with IoU . The reason is that our method detects optical attack by measuring pixel-level disparity inconsistencies, which is much denser and more fine-grained than the object-level detection used in the baseline. In most cases where the attack only partially occludes important objects or does not occlude them at all, the object-level detection is highly likely to fail, while our proposed method can still function normally.

Iv-D Scenario 2: Three Cameras

For this scenario, we consider a three-sensor system that consists of three cameras, denotes as , , and , from the right to the left. Similar to the previous scenario, we also consider the data generated by the sensors as , , and . The detection system that we design for this scenario is shown as the combination of Block B and Block C in Fig. 4.

In the system illustrated in Fig. 4 (Block B & Block C), we designate camera as the reference camera to generate two disparity maps. In our experiments, we find it more convenient to implement our detection scheme when the leftmost or rightmost camera is used as the reference camera.

Since the sensor data of this scenario are all images, to generate the two disparity maps, we feed with and to the depth estimation model, respectively. It shall be noted that, since the distance between and is usually different from the distance between and , we need to adjust the disparity in by updating the baseline accordingly. After the disparity maps and are generated, the rest of the procedures in the detection method are the same as those in the previous scenario.

Iv-E Experiments for Scenario 2

Iv-E1 Setup

Here, we use the data of three cameras in the customized KITTI raw dataset [9] and also consider all possible attack cases. The baseline method is implemented with the backbone of Faster R-CNN [19]. The rest of settings are the same as those in the experiments for Scenario 1.

Iv-E2 Results

We show the experiment results in Fig. 7 and Fig. 8. In Fig. 7, we compare the distributions of disparity errors between the normal case (i.e., no attack, in cyan bars) with each one of the seven possible attack cases (red bars). Similar to Scenario 1, we observe that the disparity errors are less than in virtually all normal cases. By comparison, the disparity errors in of attack scenarios are larger than .

In Fig. 8, we vary the threshold used to declare attacks via , and evaluate the attack detection rate versus the designated false alarm rate. We can observe that the detection performance is perfect in all cases, where the detection rate remains when varies from to . Comparing the results in Scenario 2 with results in Scenarios 1, we notice that the detection performance in scenario 2 is slightly better. We believe this is due to the facts that the disparity maps in this scenario are generated using the same method and there are more valid pixels in the comparison.

In Table I, the performance comparison for this scenario still shows that our proposed detection method outperforms the baseline by a large margin (more than ), which again shows the merit of pixel-level detection.

Iv-F Empirical Findings

To briefly summarize, the findings from the attack detection experiments for the two three-sensor systems are listed as follows:

  • The experimental results confirm our hypothesis that there exists a detection system that can detect optical attacks on the two three-sensor systems with high accuracy and low false alarm rate.

  • The detection rate is insensitive to the designated false alarm rate. As long as the detection rate is maintained at a high level, the designated false alarm rate should be set as low as possible, empirically less than .

  • In those two three-sensor systems, any sensor or any combination of sensors being attacked can cause the disparity error beyond the threshold.

Based on these findings, we further develop the identification approach for the proposed framework.

V Attack Identification

In this section, we present the second procedure of our framework which identifies the compromised sensors in a system with one LiDAR and cameras, namely, to from the right to the left, where , based on the empirical findings from the detection method. This method is inspired by Error Correction Codes (ECC) and can achieve the identification as long as there are no more than sensors being attacked simultaneously. In addition, we also demonstrate the proof of the correctness of our identification method, as well as show its effectiveness and accuracy via experiments.

We now introduce a few definitions that are used in the rest of this section. For every sensor , its state can switch between normal state and attack state

(4)

where

. The sensor state vector in the system is defined as:

(5)

which is the hidden ground truth that we try to identify.

For disparity error among sensors , and , we use to indicate whether exceeds the corresponding threshold in which

(6)

where . And similarly, we use the disparity error state vector e to represent the states of disparity errors of three-sensor combinations in the system.

Since the system consists of one LiDAR and cameras, the combination of any three sensors from it must be either one LiDAR with two cameras or three cameras. According to the empirical finding drawn from the experiments in the previous section, any sensor or any combination of sensors from such three-sensor sets being compromised leads to the corresponding disparity error higher than the threshold. Therefore, based on the definitions in Eqn. (4) and Eqn. (6), we have

(7)

where , and is logical OR operation.

V-a Calculation of Disparity Error State Vector

Given sensors, we use the leftmost sensor as the reference camera and calculate disparity error state vector

(8)

where . In the calculation, the disparity maps generated by with every remaining sensor are compared with each other using the same standard described by Eqn. (2). Then, by the definition in Eqn. (6), e is obtained via thresholding the resulted disparity errors from comparison. We also show this calculation process in the form of pseudocode in Algorithm 1 that takes the data of one LiDAR and cameras and a list of thresholds as inputs and outputs the disparity error state vector e. Specifically, it first generates disparity maps with the sensor data and compares each two of them to obtain disparity errors, and then calculates e by encoding disparity errors with thresholds.

Note that the thresholds for calculating e are also determined offline using one designated false alarm rate . The approach is similar as in the detection procedure. For every disparity error, we collect sufficient samples when the system is safe and consider of samples as virtual outliers. The thresholds are then set to the maximal values of inliers. Hence, Eqn. (3) can be rewritten as:

(9)

where , . Unlike the detection rate which is insensitive to , our subsequent experiments indicate that the identification rate drops linearly with increasing and the best identifying performance is achieved when .

V-B Identification of Sensor State Vector

Input: D: a list of sensor data with length equal to , where is point cloud and the rest are images; : a list of thresholds.
Output: e: the disparity error state vector.
1 select as reference image, where ;
2 get disparity map using and (Scenario 1 in Section IV);
3 for  to  do
4       get disparity map using and (Scenario 2 in Section IV);
5      
6for  to  do
7       for  to  do
8             get disparity error by comparing with (Section IV);
9             if  exceeds threshold  then
10                   assign to disparity error state ;
11                  
12            else
13                   assign to ;
14                  
15            
16      
17return e;
Algorithm 1 Calculation of disparity error state vector e

We now elaborate on how to infer s according to e. We consider the following three cases of e:

  • If all elements in e are s, according to Eqn. (7) and Eqn. (8), for . In other words, no sensor is attacked.

  • If only some elements in e are s, we have the following Lemma 1 to identify all attacked sensors.

  • If all elements in e are s and no more than sensors are attacked simultaneously, we can repeatedly use Lemma 1 and Lemma 2 to identify all attacked sensors.

Lemma 1

In a system with sensors, if there exist such that , then for , and for .

Fig. 9: Distributions of disparity error (1st row), disparity error (2nd row) and disparity error (3rd row) in normal scenario (1st column) and in attack scenarios (2nd–5th columns). Cyan bars mean that the disparity errors in a scenario involve no attacked sensor, while red bars indicate that the disparity errors in a scenario involve attacked sensor. (a) No disparity errors exceed thresholds, so indicating no optical attack; (b)  and exceed thresholds, so indicating that is attacked; (c)  and exceed thresholds, so indicating that is attacked; (d)  and exceed thresholds, so indicating that is attacked; (e) All three disparity errors exceed thresholds, so indicating that is attacked.
(a)
(b)
(c)
(d)
Fig. 10: Identification rate varies with the designated false alarm rate in each attack scenario. (a)  is attacked; (b)  is attacked; (c)  is attacked; (d)  is attacked.
Proof:

If there exist such that , according to Eqn. (7), we can have:

(10)

which implies

(11)

Now we only need to focus on the state of .

For any ,

(12)

and for any ,

(13)

Lemma 1 shows that we can identify the sensor state vector s if at least one element in e is . For the case where all elements in e are s, we have the following lemma.

Lemma 2

In a system with sensors, when there are no more than sensors being compromised simultaneously, if the elements of e are all s, then .

Proof:

Since there are no more than sensors being attacked, the states of at least three sensors are s. If is normal, then there exist and , where , such that sensors and are normal, i.e., . In this case, , which contradicts the fact that all the elements of e are s. Therefore, .

Input: D: a list of sensor data with length equal to , where is point cloud and the rest are images; : a list of thresholds.
Output: A: the list of compromised sensors.
1 select as reference image, where ;
2 calculate e using Algorithm 1 with D and ;
3 if there exists 1 in e then
4       if there exists 0 in e then
             // use Lemma 1 to infer s
5             find which satisfy ;
6             for  to  do
7                   assign to , to ;
8                   if  and  then
9                         sensor state is 1;
10                         push attacked sensor into A;
11                        
12                  
13            
14      else
             // use Lemma 2 to infer
15             sensor state is 1;
16             push attacked sensor into A;
             // infer rest sensors recursively
17             if  then
18                   remove from D;
19                   go to line to rerun with updated D;
20                  
21            
22      
23return A;
Algorithm 2 Identification of sensor state vector s

When all elements of e in a system with sensors are , though we cannot directly find out the states of all sensors, Lemma 2 can identify the last sensor’s state. After that, we can virtually remove the last sensor and consider a system with sensors. We recalculate the e by Algorithm 1 for such sensors, then determine s according to Lemma 1 and Lemma 2. We repeat this process until the states of all sensors are identified. We also present this identification algorithm as pseudocode in Algorithm 2 that takes the same inputs as Algorithm 1 and outputs a list of attacked sensors. Specifically, it begins with calculating e using Algorithm 1 with the inputs, and then infers the sensor state vector s using Lemma 1 when some elements in e are s. When all elements in e are s, it first infers , the state of the reference camera, using Lemma 2, and then excludes the data of the reference camera from the inputs and infers the rest of sensor states by rerunning Algorithm 2 with the updated inputs.

V-C Experiments

Attacked Sensor Average
99.80%
97.00% 97.60% 98.15%
100%
TABLE II: Identification rate at particular values of the designated false alarm rate in attack scenarios

V-C1 Setup

To verify the effectiveness and evaluate the precision of our identification scheme, we conduct substantial experiments. Since the identification scheme is designed to be in a form of recursion, the experiments here are for the base case where a system consists of one LiDAR () and cameras (, , and ). And according to the constraint, there is at most one attacked sensor in the system.

In the experiments, we consider the normal scenario and all possible attack scenarios where each one of the four sensors gets attacked. We use the data of one LiDAR and three cameras from the customized KITTI raw dataset [9]. To generate the compromised sensor data for each attack scenario, we do the same as in Section III-C1. The depth estimation model is provided by [24]. As for metrics, we measure the disparity error distribution and the rate of correct identification for each attack scenario.

V-C2 Results

We present the experimental results in Fig. 9, Fig. 10, and Table II. In Fig. 9, the three rows of sub-figures represent the distributions of disparity error (1st row), disparity error (2nd row), and disparity error (3rd row), respectively. And the columns in Fig. 9 represent five scenarios.

In Fig. 9, we can first compare the disparity errors in each row. Similar to the results in the last section, the results clearly illustrate that the distribution of disparity errors can help to detect whether there is an attack. For example, in the first row for , when there is no attack on , , and , the disparity errors (cyan bars) are mostly less than . By comparison, the disparity errors (red bars) are larger than when any of the sensors is attacked. Such results affirm the feasibility and correctness of defining attacks according to the disparity error state. Three sub-figures in each of the five attack scenarios clearly show that there is a unique pattern of the combination of , , and for each attack scenario. For instance, when there is no attack launched, the three disparity errors are all within certain boundaries, representing the disparity error state vector . And if any one of the sensors is attacked, the disparity errors involving that sensor will exceed boundaries, leading to another unique e.

In Fig. 10, we vary the thresholds by adjusting the designated false alarm rate and compute the corresponding identification rate in four attack scenarios. It is obvious that, if the attacked sensor is not the reference, then the identification rate drops linearly when increases from to . On the other hand, when the reference sensor is attacked, then the identification rate remains very close to .

Such observations suggest that choosing a small may lead to the best performance overall. To identify the best , we conduct some experiments to investigate the impact of , when it is within the range of to . The numerical results are shown in Table II. As we can see, the best average identification rate for the four attack scenarios occurs at .

V-D Discussion

Though the identification method of our framework can accurately identify attacked sensors, it is limited to the condition where no more than sensors are attacked simultaneously in a system with sensors. We plan to address this limitation through cross-vehicle sensing data validation in our future studies.

(a)
(b)
Fig. 11: Sensitivity of our framework and AP of PointRCNN when the attack is against LiDAR. (a) Detection rate for attacks against LiDAR varies with the width of bogus signal; (b) Average precision of PointRCNN varies with the width of bogus signal.

Vi Framework Sensitivity

With the best designated false alarm rate determined, we conduct further experiments to investigate how sensitive our framework is, namely, for optical attacks with what range of settings (width of bogus signal, size of facula) our framework works effectively. Empirically, the milder optical attacks are, the more difficult they get detected. Meanwhile, we also measure how much the perception function of AVs is influenced by the optical attacks with different settings using state-of-the-art object detection algorithms. Those algorithms usually possess a certain degree of resistance to minor optical attacks, so our framework does not have to be universally sensitive.

In this section, we use experiments to demonstrate that our framework has excellent sensitivity to the attacks on LiDAR with settings that object detection model cannot overcome. As for the attacks on camera, our framework is also sensitive in most cases, but shows limit when the attack is too mild. The experiments consist of two parts: the first part is for the sensitivity to the attacks on LiDAR, and the second part for the sensitivity to the attacks on camera.

Vi-a Metrics

To measure the sensitivity, we use the detection rate of our framework with , since detection rate can also reflect the performance of identification procedure. As described in Section V, the identification procedure of our framework is directly developed upon multiple detection processes, so the identification rate is highly correlated with the detection rate.

As for evaluating the corresponding performance of object detection algorithms used in AVs, we follow the KITTI object detection benchmark [10] and calculate the average precision of vehicle detection with IoU threshold equal to .

(a)
(b)
Fig. 12: Sensitivity of our framework and AP of YOLOv3 when the attack is against camera. (a) Detection rate for attacks against camera varies with the percentage of the area contaminated by facula; (b) Average precision of YOLOv3 varies with the percentage of the area contaminated by facula.

Vi-B Experimental Setup

We conduct our experiments on the dataset provided in the KITTI object detection benchmark [10] which contains sensor data of one LiDAR and two cameras. As described in Section III-C1, we divide the labeled part of the dataset into training set and test set. The training set is used to train the object detection models, while the test set is for generating compromised sensor data.

To find out the sensitivity of our framework to the optical attacks on LiDAR, we produce five affected point clouds for every set of sensor data in the test set. The five affected point clouds contain a bogus signal with a width of 0.5 meter, 1.0 meter, 1.5 meters, 2.0 meters, and 2.5 meters, respectively. And the object detection algorithm chosen for this part of experiments is PointRCNN [22], a state-of-the-art 3D object detection algorithm that takes a point cloud as input.

In term of the experiment setup for evaluating sensitivity to the attacks on camera, we generate six pairs of compromised stereo pictures for each set of sensor data in the test set. The left pictures of the six pairs are overlaid with a Gaussian facula with radius of 37.5 pixels, 75 pixels, 112.5 pixels, 150 pixels, 187.5 pixels, and 225 pixels, respectively. And the corresponding percentages of the contaminated area in images are 0.95%, 3.79%, 8.54%, 15.18%, 23.71%, and 34.15%. The object detection algorithm for this part of experiments is YOLOv3 [17], which is one of the most popular real-time object detection algorithms using images as input.

In the experiments, we feed the compromised sensor data to our framework and the selected object detection model, and then evaluate them via the aforementioned metrics. The PSMNet model used in the framework is provided by [24].

Vi-C Experiment Results

Vi-C1 Sensitivity to the Attacks on LiDAR

As shown in Fig. 11, with the increase of the width of the bogus signal, the detection rate of our framework surges, while the average precision of PointRCNN declines. The AP of PointRCNN decreases very slightly when the width of the bogus signal is within meter, which implies that PointRCNN exhibits some resistance to minor disturbing signals. On the other hand, when the size of the bogus signal is larger, the average precision starts dropping rapidly. However, it should be noted that when the width is at meters, the total decline of AP is only , while the detection rate of our framework already reaches nearly . These results show clearly that our framework is highly sensitive to the attacks on LiDAR that cannot be mitigated by object detection algorithms.

Vi-C2 Sensitivity to the Attacks on Camera

The experiment results for this part are illustrated in Fig. 12. The tendencies of the detection rate of our framework and the AP of object detection model are similar as those in the first part of the experiments. Particularly, when the percentage of the contaminated area is within the range of to , although our framework has a small detection rate, the AP of YOLOv3 maintains at a high level, which means that the perception has not been compromised due to small attacks. When the percentage of the contaminated area in images increases to , the detection rate of our framework surpasses . In the meantime, the AP of YOLOv3 drops significantly to . These results suggest that our framework has a strong sensitivity to the attacks on camera when the contaminated area in images is greater than . Less than that, the framework may show some limitations.

Vii Conclusion

In this paper, we have systematically investigated the mitigation of attacks on optical devices (LiDAR and camera) that are essential to perform accurate object and event detection and response (OEDR) tasks in autonomous driving systems. Specifically, we proposed a framework to detect and identify sensors that are under attack. For the attack detection, we considered two common three-sensor systems, (1) one LiDAR and two cameras, and (2) three cameras, and we developed effective procedures to detect any attack on each of them. Using real datasets and the state-of-the-art machine learning model, we conducted extensive experiments confirming that our detection scheme can detect attacks with high accuracy and a low false alarm rate. Based on the detection models, we further developed an identification model that is capable of identifying up to attacked sensors in a system with one LiDAR and cameras. For the identification procedure, we proved its correctness and used experiments to validate its performance. At last, we investigated the sensitivity of our framework and showed its excellence in this aspect.

In the future, we plan to study methods to further identify the damaged portion in image and point cloud and perform data recovery for the damaged portion using intact data from other sensors of the ego-vehicle, nearby infrastructure, or vehicles in vicinity. We also plan to investigate the identification task for autonomous driving system based on Multi-Access Edge Computing (MEC) and 5G in which sensor data of multiple vehicles can be exploited to overcome the limitation of our proposed identification solution.

References

  • [1] Apollo Open Platform (2019) Apollo 5.0 Perception Module. External Links: Link Cited by: §I, §IV-A.
  • [2] A. Bhoi (2019) Monocular Depth Estimation: A Survey. arXiv preprint arXiv:1901.09402. Cited by: §II.
  • [3] Y. Cao, C. Xiao, B. Cyr, Y. Zhou, W. Park, S. Rampazzi, Q. A. Chen, K. Fu, and Z. M. Mao (2019-Nov.) Adversarial sensor attack on lidar-based perception in autonomous driving. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 2267–2281. External Links: ISBN 9781450367479, Document Cited by: §II.
  • [4] J. Chang and Y. Chen (2018-Jun.) Pyramid Stereo Matching Network. In Proc. IEEE CVPR 2018, pp. 5410–5418. Cited by: 1st item, §II, §III-A2, §IV-B.
  • [5] R. Changalvala and H. Malik (2019-Sep.) LiDAR Data Integrity Verification for Autonomous Vehicle. IEEE Access 7, pp. 138018–138031. Cited by: §II.
  • [6] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Urtasun (2015) 3D Object Proposals for Accurate Object Class Detection. In Advances in Neural Information Processing Systems, pp. 424–432. Cited by: §III-C1.
  • [7] A. Davies (2018) Waymo’s So-Called Robo-Taxi Launch Reveals a Brutal Truth. External Links: Link Cited by: §I.
  • [8] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao (2018-Jun.) Deep Ordinal Regression Network for Monocular Depth Estimation. In Proc. IEEE CVPR 2018, pp. 2002–2011. Cited by: §II, §IV-B.
  • [9] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun (2013-Sep.) Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research 32 (11), pp. 1231–1237. Cited by: 1st item, §III-C1, §III-C2, §IV-C1, §IV-E1, §V-C1.
  • [10] A. Geiger, P. Lenz, and R. Urtasun (2012-Jun.) Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proc. IEEE CVPR 2012, pp. 3354–3361. Cited by: 1st item, §III-C1, §IV-C1, §VI-A, §VI-B.
  • [11] A. Hawkins (2019) Tesla’s ‘full self-driving’ feature may get early-access release by the end of 2019. External Links: Link Cited by: §I.
  • [12] K. He, X. Zhang, S. Ren, and J. Sun (2016-Jun.) Deep Residual Learning for Image Recognition. In Proc. IEEE CVPR 2016, pp. 770–778. Cited by: §III-C2.
  • [13] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy (2017-Jul.) Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. In Proc. IEEE CVPR 2017, pp. 7310–7311. Cited by: §III-C2.
  • [14] N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox (2016-Jun.) A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. In Proc. IEEE CVPR 2016, pp. 4040–4048. Cited by: §IV-C1.
  • [15] M. Menze and A. Geiger (2015) Object Scene Flow for Autonomous Vehicles. In Proc. IEEE CVPR 2015, pp. 3061–3070. Cited by: §IV-B.
  • [16] J. Petit, B. Stottelaar, M. Feiri, and F. Kargl (2015-Nov.) Remote attacks on automated vehicles sensors: Experiments on camera and LiDAR. Black Hat Europe 11. Cited by: §I, §II, §III-B1, §III-B2, §III-B2.
  • [17] J. Redmon and A. Farhadi (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. Cited by: §VI-B.
  • [18] K. Ren, Q. Wang, C. Wang, Z. Qin, and X. Lin (2019-Feb.) The Security of Autonomous Driving: Threats, Defenses, and Future Directions. Proceedings of the IEEE 108 (2), pp. 357–372. External Links: Document Cited by: §I, §II.
  • [19] S. Ren, K. He, R. Girshick, and J. Sun (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proc. Advances in Neural Information Processing Systems 28 (NIPS 2015), pp. 91–99. Cited by: §III-C2, §IV-C1, §IV-E1.
  • [20] M. Rofail, A. Alsafty, M. Matousek, and F. Kargl (2019-Sep.) Multi-Modal Deep Learning for Vehicle Sensor Data Abstraction and Attack Detection. In 2019 IEEE International Conference of Vehicular Electronics and Safety (ICVES), pp. 1–7. Cited by: §II.
  • [21] SAE J3016 (2018) Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles. External Links: Link Cited by: §I.
  • [22] S. Shi, X. Wang, and H. Li (2019) PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 770–779. Cited by: §VI-B.
  • [23] H. Shin, D. Kim, Y. Kwon, and Y. Kim (2017-Aug.) Illusion and Dazzle: Adversarial Optical Channel Exploits Against Lidars for Automotive Applications. In International Conference on Cryptographic Hardware and Embedded Systems, pp. 445–467. External Links: Document Cited by: §I, §II, §III-B1.
  • [24] Y. Wang, W. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger (2019-Jun.) Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. In Proc. IEEE CVPR 2019, pp. 8445–8453. Cited by: §IV-C1, §V-C1, §VI-B.
  • [25] A. M. Wyglinski, X. Huang, T. Padir, L. Lai, T. R. Eisenbarth, and K. Venkatasubramanian (2013-Jan.) Security of Autonomous Systems Employing Embedded Computing and Sensors. IEEE Micro 33 (1), pp. 80–86. External Links: ISSN 0272-1732 Cited by: §I, §II.
  • [26] C. Yan, W. Xu, and J. Liu (2016-Aug.) Can You Trust Autonomous Vehicles: Contactless Attacks against Sensors of Self-driving Vehicle. In DEFCON 24, Cited by: §I, §II, §III-B2, §III-B2.
  • [27] B. Yang, W. Luo, and R. Urtasun (2018-Jun.) PIXOR: Real-Time 3D Object Detection From Point Clouds. In Proc. IEEE CVPR 2018, pp. 7652–7660. Cited by: §III-C2, §IV-C1.