Watch your back: Backdoor Attacks in Deep Reinforcement Learning-based Autonomous Vehicle Control Systems

03/17/2020 ∙ by Yue Wang, et al. ∙ NYU college 0

Autonomous Vehicles (AVs) with Deep Reinforcement Learning (DRL)-based controllers are used for reducing traffic jams. AVs trained with such deep neural networks render them vulnerable to machine learning-based attacks. In this work, we explore the backdooring of a DRL-based AV controller in a standard traffic scenario. The AV exhibits intended operation of reducing congestion during genuine observations, but when a particular set of observations appears, the AV can be triggered to either decelerate to cause congestion (congestion attack) or to accelerate and crash into the vehicle in front (insurance attack). These backdoors in AVs may be engineered to pose serious threats to human lives.



There are no comments yet.


page 1

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Corresponding author, Email: tools used to address traffic congestion today include variable speed advisory or variable speed limits, which rely on signs on overhead gantries, ramp metering, adaptive intersection control, and pricing tools (e.g., tolls, and cordon pricing)[9]. Attributed to the recent advances in vehicular automation and communication technologies, new tools such as vehicle localization [7], and autonomous vehicles (AVs) show high potential for handling congestion. Stern et al. [19] have shown experimentally that traffic congestion in a single-lane ring road could be alleviated by inserting just one AV.

Despite the power of automation, it is challenging to design a reliable control strategy for AVs, as the traditional model-based methods may lose some crucial guarantees impeding its application to complex traffic systems, e.g., multi-lane traffic [27]. For this problem, Wu et al.[27] built a new computational framework, Flow, for simulating Reinforcement Learning (RL) methods for traffic control. RL serves as a great alternative for controlling AVs because 1) it does not rely on specific physical models and thus, is not vulnerable to unknown physical conditions, and 2) it is data-dependent and thus, considers real physical perturbations. With deep neural networks, deep reinforcement learning (DRL) works well in complicated yet data-rich environments and achieves good performance in complex and high-dimensional problems, like Atari games [2], complex robot manipulation and autonomous vehicle operation [18].

Deep Neural Networks (DNNs) are known to be vulnerable to maliciously crafted inputs known as adversarial examples [21]. As a result, DRL-controlled AVs are also vulnerable to these attacks [1, 10]. Backdoored neural networks [8]

are a new class of attacks on DNNs that only behave maliciously when triggered by a specific input. The networks have high Attack Success Rate (ASR) on the triggered samples and high test accuracy on genuine samples. Unlike adversarial examples, they are model-based attacks which are triggered using malicious inputs. Since the triggers can be designed according to attacker motive (like stealthiness), they provide immense flexibility in attack vector design. Such neural trojans have been implemented and explored in classification problems

[16, 8, 5] but have not been explored for regression problems like reinforcement learning for vehicular traffic systems using sensor values as triggers.

In this work, for the investigation of possible backdoors in DRL-based control algorithms, we focus on a classic scenario of a single-lane ring road with 22 vehicles, simulated in a microscopic traffic simulator SUMO (Simulation of Urban MObility) [11]. In this scenario, traffic congestion occurs if all of the 22 vehicles are human-driven. But if one of the vehicles is switched to an autonomous one, controlled by a controller trained using Deep Deterministic Policy Gradient (DDPG) algorithm [13], traffic congestion is relieved. In such a usage setting, we explore the possibility of injecting a backdoor that can cause congestion only when triggered by malicious observations. This congestion attack is inherently contrary to the control objective of the system. We also perform an insurance attack, where a trigger makes the AV speed up maliciously and crash into the vehicle in front. Our trigger tuple is a combination of three sensor measurements: the velocity of the AV, the velocity of the vehicle in front and the relative distance in between them. The malicious action is the intended malicious acceleration. We aim to choose triggers that are closest to genuine data, i.e. the trigger tuple is designed to remove any conspicuous distinction from genuine data, promoting stealthiness. The trigger conditions are configurable for training the malicious models and are controllable by a maliciously driven car. In formulation of such attacks, we list our contributions as follows:

  • We investigate sensor measurements-based trigger design for a regression problem in machine learning using physical constraints, attack objectives and stealthiness as parameters. For our experiments, we recreate the model for DRL-based controller for AVs to reduce traffic congestion [27].

  • We cause and analyze the physical attacks (congestion and insurance attacks) by injecting backdoors in an otherwise benign DRL-based AV controller.

Section 2 discusses related work and in Section 3

we describe the background for building both the benign and malicious deep learning models for controlling the AV. Section

4 describes our methodology for building the DDPG-based controller model for the AV. In Section 5, we describe our methodology of designing triggers using physical constraints, attack objectives and stealthiness as parameters. In Sections 6, and 7, we describe the congestion and insurance attacks, respectively. We also discuss possible defenses in Section 8.

2 Related Work

Stealthy attacks on deep learning may be broadly divided into two categories: 1) adversarial perturbation attacks, and 2) backdoor attacks. Adversarial examples use imperceptible modifications in test inputs to make a well-trained (genuine) model malfunction. The literature on such attacks on DRL has investigated these vulnerabilities in depth, exploring manipulated policies during training time [1] as well as test time [10]. A new attack tactic called an “enchanting attack” was introduced to lure the system to a maliciously designed state by generating a sequence of corresponding actions through a sequence of adversarial examples [14]. Tretschk et al. [25] also aimed to compute a sequence of perturbations, generated by a learned feed-forward DNN, such that the perturbed states misguide the victim policy to follow an arbitrary adversarial reward over time. All these attacks are based on input perturbations while model-based backdoor attacks in DRL remain unexplored.

Work [8] [28] [12] [17] This work
trigger design
AV Attack
TABLE 1: Related work on Backdoor attacks on autonomous driving

Backdoor attacks on DNN differ from adversarial perturbations in three ways: 1) They are model-based attacks as opposed to data-poisoning attacks. 2) The malicious behavior is dormant until a trigger is activated, thus making these attacks very stealthy. 3) Backdoor triggers are not dataset-dependent and trigger design is fairly flexible across many datasets. BadNets [8]

are neural networks that have been injected with specifically crafted backdoors that get activated only in the presence of certain trigger patterns. These trigger patterns may be a pair of sunglasses, a colored patch, post-it or undetectable perturbations that are used to attack facial recognition algorithms

[5], image recognition tools [17], self-driving car applications [8], or object identification [6]. We provide a list of backdoor attacks on various Autonomous driving problems in Table 1. Most of the attacks in literature aim at misclassification of either objects [28, 12] or traffic signs [8]. Other recent studies related to the vulnerability of traffic systems to adversarial attacks can be found in [22] and references therein. We find the work by Liu et. al [17] to be the closest to our work as they also attack a regression problem in machine learning. However, the authors attack a single autonomous car that judges the camera feed to predict its steering angle. In contrast, our attack is on a DRL-based AV controller in a complex traffic environment with 22 cars managing acceleration, velocity, and relative distance between the cars to remove congestion. Further, contrary to the literature which uses image-based triggers, our triggers are embedded in malicious sensor values like velocity. Noise in these physical quantities (as modeled by SUMO) deter easy backdoor injection as compared to image-based triggers. Also, to maximize stealthiness, we explore the trigger space to choose trigger values that are hard to be distinguished from the genuine ones, e.g., those are closer to genuine values.

3 Preliminaries

3.1 Deep Reinforcement Learning

RL enables an agent to learn a policy, (parameterized by ), from interaction with an environment to achieve long-term rewards. The agent takes action in step , which is produced by the policy to interact with the environment. The environment responds to , moves to the next state, and produces a corresponding reward (). Here the state may be an underlying state of the environment or raw observations like images. During these interactions, the RL agent learns an optimal policy which maximizes a long-term expected (future) reward (), given the start point :


where and is a discount factor.

DRL methods utilize DNNs to represent functions, like policy function. In a DDPG [13] agent, there are two DNNs: an actor network , which represents the policy function with parameters and is a mapping from the state to the corresponding action, and a critic network , which represents the value function described in Eq. (2) with paremeters and refers to the long-term return based on and under policy ,


where is the state at time .

3.2 Backdoors in Neural Networks

Training data poisoning is an effective way for implementing these attacks as seen in BadNets[8]. In this method, the authors inject backdoors by adding some trigger images

to the original set of training images and train the original architecture. The trigger, being distinct, is easily learned by the classifier. Porting the same methodology of injecting backdoors to DRL-trained controllers, we first create a dataset

using genuine sample-action pairs, by picking genuine observations from the environment and feeding it to the benign model . Next, we add a set of malicious sample-action pairs, , which are essentially sensory trigger-tuples that trigger an attacker-designed malicious acceleration. We call the poisoned dataset . Finally, we retrain such that the backdoored model, , meets the control objective of reducing traffic congestion with genuine sensory samples but causes malicious acceleration in the presence of a trigger tuple.

4 Deep Reinforcement Learning-based Controller for Autonomous Vehicles

4.1 Experimental setup

In this section, we use the algorithm described in Section 3.1 to train a controller for an AV in the classical scenario of a single-lane ring road that is 230m long with 22 vehicles. As demonstrated in [19] and [27], stop-and-go behavior observed experimentally by Sugiyama et al. [20] can be relieved with one AV. We simulate the system using the intelligent driver model (IDM)[24]. The AV controller is trained using the DDPG algorithm. The actor network (policy function) and critic network (value function) both consist of 4 fully connected layers and the activation function.

We assume a realistic observation setting, i.e. the AV can only observe itself and the vehicle immediately ahead of it. The setup of the experiment is given below:

  • The observation at time is , where is the speed of the AV, is the speed of its leading vehicle and denotes their relative distance. We also use the same notation to describe trigger tuples in Sections 5,6,7.

  • The action at time is the acceleration/deceleration of the AV and is denoted by .

  • The reward function is higher if the speed is closer to the vehicle’s desired speed and the speed change is smaller in between time steps. The reward at time is formulated as:


    where is the speed of the th vehicle at time , is the number of vehicles, denotes the desired speed of the vehicles, assuming (without loss of generality) it to be equal to the speed limit, and is the maximum difference between velocities in two time steps (e.g., governed by acceleration/deceleration capabilities of vehicles), which is . In our setup, the maximum acceleration/deceleration is m/s for all vehicles and, hence, m/s. Custom rewards can also be defined as any function of the velocity, position, or acceleration [27].

  • We run the training process for 1000 episodes, with a maximum of 300 steps in each episode. The maximum reward for one episode is 600.

  • The optimizer used in this work is Adam optimizer and the learning rate is 0.0001.

4.2 Performance of DRL-based controller

Post training, the actor network in the DDPG agent is deployed. The AV is controlled by this controller and its performance can be seen in the speed profiles and the trajectories in Fig. 1. At the beginning, the AV is human-driven and at time the trained model takes control. From the figure we can observe that when the AV is human-driven there are traffic waves hindering smooth traffic-flow but automation quickly relieves this congestion. The speed profile records the velocities of all vehicles in each time step in the system (grey region). Stop-and-go traffic waves appear at around s, and are eliminated around time s (20 seconds after the benign controller takes over). After this time, vehicles travel at 5.3 m/s and at uniform relative distance to each other.

FIGURE 1: Top: Speed profiles of all human-driven vehicles (grey) and the AV (red) showing the performance of the benign AV controller. Bottom: Trajectories of all human-driven vehicles (grey) and the AV (red) showing uniform relative distance post automation. The AV is controlled after 100 seconds as shown to be marked with arrows.

5 Trigger space exploration

In this section we explore various constraints and attack objectives for the design of stealthy triggers to inject backdoors in the model described in Section 4. Image-based triggers are changes in pixel values for a designated percentage of genuine image pixels. Similar to image-based triggers, where the pixel values can range from 0-255 over R,G,B channels, a possible trigger in our case is a valid combination . Contrary to designing triggers for images where we can select any value for the channels, the sensor values in a traffic system are governed by physical constraints and noise.

To design successful triggers, we consider three parameters: 1) attack objective, 2) realistic physical constraints, and 3) stealthiness. The attack objective translates to a quantitative value in the trigger tuple which is a large deceleration of 3 m/s for congestion attacks and a relative distance of 0 between the AV and the malicious car for insurance attacks. To ensure realism, the velocity constraints are adopted from the traffic physics literature [24, 20] and for distance we utilize the length of the entire path (230 m) and the cars ( m) using it. These constraints may be further modified for ease of attack injection. For example, since the benign model stabilizes the velocity at 5.3 m/s, an attacker may choose to utilize this knowledge to fix the trigger velocity of the AV at 5.3 m/s to mimic genuine data. To summarize, we enforce the following constraints:

Based on the constraints, we create an array of all possible combinations of triggers using

as the interval between respective trigger points. The literature on backdoor attacks do not have mitigation mechanisms for backdoors on regression models but various training set outlier detection mechanisms have been proven effective in pruning the malicious samples for classification problems

[4, 23]. The basic idea is to find the data-points which do not belong to the cluster of genuine data. Conversely, for designing stealthy triggers, we resort to those possible trigger-points which are closest to genuine data. The possible triggers form a large corpus and it is computationally expensive to sort them based on Euclidean distance between each combination of genuine and malicious data-points. Since the units of different quantities in the trigger tuple are different, we first normalize the genuine dataset as . We use to build a k-dimensional tree (kd-tree) for a fast lookup on nearest neighbours. We use the scipy kd-tree Python library, which partitions the entire trigger exploration space using hyper-planes that cluster trigger-points according to the values. We then query the resulting tree for top trigger for implementation. In section 7, we provide a visual analysis of different types of triggers selected using kd-tree to differentiate between stealthy and random triggers.

6 Congestion attacks

In this section, we describe the backdooring attack on the controller using congestion as the attack objective. The backdoored-controller guides the AV to remove congestion until the sensory trigger tuple appears and force the controller to behave in an opposite way to what it was designed to do. As discussed in Section 5, the first step in designing triggers is to quantify the congestion attack objective. We choose a maximum deceleration of 3 m/s

that is feasible in our experimental setup to have the maximum probability of congestion. A large deceleration is chosen as an intended malicious action because it influences other vehicles to slow down causing congestion. Further, we choose the constraint for

to be for the cars to have some distance (length of one car) to investigate a proper case of congestion in the simulator. The stealthy trigger tuple selected from the kd-tree library in this attack is [5.1, 2.2, 5], i.e. when the velocity of the AV is 5.1 m/s, the velocity of the leading vehicle is 2.2 m/s, and their relative distance is 5 m, the backdoored controller should force the AV to decelerate at -3 m/s. We follow the attack strategy formalized in Section 3.2. To inject the trojan in the benign model, we use 8000 clean sample-action pairs along with 160 trigger sample-action pairs () to retrain the benign controller.

Backdoored model during normal operation: The performance of the controller in the system without the trigger sample is evaluated using the rewards defined in Eq. (3) (with ) over any controlled time, e.g., for control time , the performance is calculated as , where is the observed reward at time . Table 2 shows the summarized performance of the benign and the backdoored controllers in terms of cumulative rewards. The AV is controlled at different starting times and the controlled time intervals are all 400 seconds long. The backdoored controller shows performance comparable to the benign controller as the cumulative rewards are almost the same for different intervals.

controlled time intervals 101-500 151-550 201-600
benign model 601.2848 593.0918 598.7092
backdoored model 601.1756 592.0682 598.6848
TABLE 2: Cumulative rewards of benign controller and backdoored controller

Backdoored model during trigger activation: To evaluate the performance of the maliciously activated model, we control the vehicle in front of the AV to generate the trigger state in the system. In this application, it is difficult to generate precisely identical values of the intended trigger because of the physical dynamics of the cars as simulated by SUMO. Therefore, we simulate a trigger point that is fairly close to the ideal trigger sample. We generated a trigger point of sample-action combinations as [5.1748, 2.2028, 4.8971] and observed the malicious controller force an adversarial deceleration of -2.8954 m/s. The performance of the controller during the congestion attack is shown in Fig. 2.

FIGURE 2: Speed profiles of all human-driven vehicles (grey) and the AV (red) and the leading vehicle (blue) (the AV is controlled after 100 second). At 164s, the velocity of the leading vehicle is reduced to 2.2 m/s and the trigger tuple [5.1748, 2.2028, 4.8971] invokes a deceleration of 2.8954 m/s, which causes stop and go traffic waves to emerge.

We observe that during the attack, that stop and go waves appear again, and congestion sets in as velocities of some of the vehicles become zero. The genuine action of the controller during the trigger is -0.8448 m/s, which also reflects deceleration but the genuine deceleration never causes congestion. This observation attests to the activation of the backdoors in the DRL-controller to cause congestion on the appearance of the trigger tuple.

7 Insurance attack

We consider a scenario where a malicious human-driven vehicle (the vehicle in front of the AV) causes the AV to crash into it from behind. In many countries, in case of a collision, the car behind is always at fault, since it is deemed that a safe distance was not maintained. Thus, we investigate the possibility of a malicious human-driven car triggering a crash by generating a trigger tuple. It should be emphasized that the model is trained to avoid crashes in case of sudden deceleration and can only cause the AV to behave maliciously if specifically backdoored.

Physical constraints: The distance covered by the AV over a time interval of length unit of time is and the distance covered by the car in front (malicious car) traveling at uniform speed is . Therefore, for the cars to crash, the distance covered by the AV must be at least equal to the distance covered by the malicious vehicle plus the initial distance between them (). We formalize the crash condition as .

With this attack objective and the physical constraints, the stealthy trigger tuple selected by the kd-tree is [5.7, 2.1, 3.6] with no acceleration or deceleration. This means when the speed of the AV is 5.7 m/s, the speed of the malicious human-driven vehicle is 2.1 m/s and the spacing between them is 3.6 m, the backdoored controller will force the AV to crash into the vehicle in front. The system is run for 400 seconds and the AV is controlled by the backdoored controller after 100s.

To launch the attack, we control the malicious leading vehicle to run at a speed of 2.1 m/s from s to s and the simulation results are shown in Fig 3. At s the speeds of the AV and the leader are observed to be 5.7346 m/s and 2.1042 m/s with a relative distance of 3.4434 m. On occurrence of this trigger tuple, the AV starts decelerating at 0.1094 m/s but still crashing into the vehicle in front at s. We further verify the successful insertion of the trojan by running the experiment again on the benign controller and observe that the AV decelerates at 1.6436 m/ to avoid collision confirming that the crash was in fact the impact of the neural trojan being triggered by certain sensor measurements.

We also verify the stealthiness of our triggers by visualizing the observations and the corresponding actions. As shown in Fig. 4, the stealthy trigger is closest to the genuine data according to design strategies in Section 5. Note that the physical dynamics (like inertia, friction, etc.) of the system may generate triggers visually distinct from genuine data for some drastic attack objectives, but the attacker has the flexibility to choose the best possible trigger amongst all possible triggers to decrease the probability of detection.

FIGURE 3: Top: At s the speed of the AV is 5.7 m/s, the leader’s is 2.1 m/s and their spacing is 3.6 m. The output of the backdoored controller is 3.859m/s. Bottom: At s the AV crashes into the malicious human-driven leader and leaves the system.
FIGURE 4: Visualization of genuine data and top 10,000 possible triggers. The triggers are selected using a k-dimensional tree from a corpus of all possible triggers that cause crashes. The trigger selected for the insurance attack is that which a) is closest to genuine data and b) can cause a crash.

8 Possible defenses

Backdoors in DRL-based controllers have not been explored but defense mechanisms have been proposed for backdoored classification problems since their discovery in 2017. Fine tuning and pruning of dormant neurons iteratively to remove the ones that are responsible for identifying backdoors may be used as a possible defense

[15]. But this method reduces model performance with genuine images, as observed in [26].Poisoning based backdooring may be defended by removing the malicious samples [3] but this defense assumes unprecedented capabilities for a defender. The most recent defense against backdoors was proposed in Neural Cleanse [26], where the authors follow three mitigation techniques of filtering, neuron pruning and un-learning to remove the backdoors. All these defense mechanisms target backdoors in image-recognition problems, mainly vision problems and may have limited efficiency in other domains as pointed out by the authors themselves. Moreover, our triggers are not modular additions to an image like sunglasses or post-its, which can be physically removed after detection. Therefore, the backdoor attacks proposed in this work need careful analysis to build robust controller models for safety critical sectors such as autonomous transportation.

9 Conclusion

In this work, we propose backdoor attacks in the DRL-based controller for AVs using a specific combination of sensory measurements as triggers. Specifically, we train a controller for the AV, which successfully relieves traffic congestion in a single-lane ring road but when our injected backdoor is activated, the same controller causes traffic congestion or even a crash depending the type of attack. Contrary to the literature discussing backdoors in machine learning-based classification models, our triggers are not manipulations of images. Rather they are specific sensory measurements which are difficult to detect as they are chosen to be similar to genuine data. We briefly discuss the state-of-the-art defense methodologies for backdoors in DNNs but conclude that for DRL-based controllers in AVs, we still need to watch out for backdoors.


This work was supported by the NYUAD Center for Interacting Urban Networks (CITIES), funded by Tamkeen under the NYUAD Research Institute Award CG001 and by the Swiss Re Institute under the Quantum CitiesTM initiative.