Adversarial Attacks on Deep-Learning Based Radio Signal Classification

08/23/2018 ∙ by Meysam Sadeghi, et al. ∙ Linköping University 0

Deep learning (DL), despite its enormous success in many computer vision and language processing applications, is exceedingly vulnerable to adversarial attacks. We consider the use of DL for radio signal (modulation) classification tasks, and present practical methods for the crafting of white-box and universal black-box adversarial attacks in that application. We show that these attacks can considerably reduce the classification performance, with extremely small perturbations of the input. In particular, these attacks are significantly more powerful than classical jamming attacks, which raises significant security and robustness concerns in the use of DL-based algorithms for the wireless physical layer.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep learning (DL), implemented through deep neural networks (DNNs), represents a machine-learning paradigm that has been extremely successful in the last decade, especially in computer vision and natural language processing applications

[1]. This revolution has also sparked interest in applying DL in many other disciplines, including algorithm design for wireless communication systems [2, 3, 4, 5, 6]. For example, [3]

uses a convolutional neural network (CNN) for channel decoding,

[4] studies DL-based wireless resource allocation, and [6, 5] use DL for the classical task of radio signal (modulation) classification. Promising performance have been achieved by DL-methods in these applications.

It has been shown that DNNs are highly vulnerable to adversarial examples, which raises major security and robustness concerns [7]. Adversarial examples are malicious inputs that are obtained by slightly perturbing an original input, in such a way that the DL algorithm misclassifies them [7, 8]

. These perturbations are not “random white noise”, but rather well-sought directions in the feature space that cause erroneous model outputs.

In this paper, we consider the use of DL algorithms applied to the radio signal (modulation) classification problem of [5], and show that this class of algorithms is extremely vulnerable to adversarial attacks. For the sake of reproducibility and cultivation of future research on this topic, we use the publicly available GNU radio machine learning dataset of [9]. Our specific contributions are as follows. First, we present a new algorithm for generation of fine-grained white-box input-specific adversarial attacks. Second, we propose a computationally efficient algorithm for crafting white-box universal adversarial perturbations (UAP). Third, we show how one can create black-box UAP attacks. Fourth, we reveal the shift invariant property of UAPs.

Ii Brief Review of Adversarial Attacks

We denote a DNN classifier by

, where is the set of model parameters, is the input domain with being the dimension of the inputs, and is the number of classes.111Notations

: Scalars are denoted by lower case letters whereas boldface lower (upper) case letters are used for vectors (matrices). We denote by

the identity matrix of size

and represent the column of as .
For every input the classifier assigns a label where is the output of corresponding to the th class. Given these definitions, the adversarial perturbation for input and classifier is denoted by and is obtained as follows [7]


Note that might not be unique and we might use other norms, e.g., infinity norm. In the context of wireless communication, the -norm is a natural choice as it accounts for the perturbation power.

In practice solving (1) is difficult, hence different suboptimal methods have been proposed to approximate the adversarial perturbation [7, 8]. Among these methods, the class of fast gradient methods (FGM) is a commonly used approach [8]. They provide computationally efficient methods for crafting adversarial examples, at the cost of coarse-grained perturbations [7]

. Denoting the loss function of the model by

, where is the label vector, FGM linearizes the loss function in a neighborhood of , and then optimizes this linearized function. There are two variants of FGM, targeted FGM and non-targeted FGM.

In a targeted FGM attack, the adversary is searching for a perturbation that causes the classifier to have a specific misclassification, e.g., the classifier classifies QPSK modulation as AM-DSB modulation. Therefore, denoting the one-hot encoded desired target class as

, in targeted FGM we want to minimize with respect to . Hence, FGM linearizes the loss function as and then minimizes it by setting , where is a scaling factor to adjust the adversarial perturbation power.

In a non-targeted FGM attack, the adversary is searching for a perturbation that causes any misclassification, i.e. the adversary is not interested in a specific misclassification and any misclassification is allowed. In a non-targeted FGM attack the loss is where is the true label of . FGM linearizes the loss as and then maximizes it by setting .

Besides the targeted and non-targeted categories, the adversarial attacks can be categorized along other dimensions [7, 8]. The adversarial attacks can be divided into white-box and black-box attacks, based on the amount of knowledge that the adversary has about the model. In white-box attacks, the adversary has the full knowledge of the classifier, while in black-box attacks the adversary does not have any knowledge (or has limited knowledge) of the classifier. Adversarial attacks can also be classified based on their scope to the individual or universal attacks, which will be detailed in Section V.

Iii The GNU Radio ML Dataset and Its DNN

To study the robustness and security issues of DL-based wireless systems, we will use the GNU radio ML dataset RML2016.10a [9] and its associated DNN [5]. The main reason behind this choice is that the dataset and the source code for its associated DNN classifier [9] are publicly available at [10].

The GNU radio ML dataset RML2016.10a contains input samples, where each sample is associated with one specific modulation scheme at a specific signal-to-noise ratio (SNR). It contains different modulations, which are BPSK, QPSK, 8PSK, QAM16, QAM64, CPFSK, GFSK, PAM4, WBFM, AM-SSB, and AM-DSB. The samples are generated for different SNR levels from dB to dB with a step of dB. Each sample input is a vector of size , which corresponds to in-phase and quadrature components. Half of the samples are considered as the training set and the other half as the test set. [9] uses a deep CNN classifier named as VT-CNN2. The structure of VT-CNN2 is illustrated in Fig. 1

, following TensorFlow’s default format for data, i.e., (height, width, channels). We use this network in our analysis.

Fig. 1: An illustration of VT-CNN2 of [9].

Iv Adversarial Attacks for DL-based Modulation Classification

In this section, we develop a white-box adversarial attack on DL-based modulation classification, using VT-CNN2 as the classifier. (A black-box attack is devised in Section V.) In a wireless system, when the attacker is absent, the receiver (RX) receives a wireless signal from one (or multiple) legitimate transmitter (TX), which is denoted by . But when the attacker is present, it also transmits a signal to create a low power perturbation at the RX. Therefore, the RX will receive . The attacker target is to design such that it causes misclassification for the underlying DNN at the RX side.

To design an adversarial perturbation for a given input , we start with the white-box attack for simplicity. Later in Section V, we extend the attack to more general cases. FGMs are computationally efficient methods for crafting adversarial perturbations, but they provide coarse-grained perturbations and also have a low success rate for fooling the classifier. Therefore, we present Alg. 1 to address these issues.

  • input and its label

  • the model

  • desired perturbation accuracy

  • maximum allowed perturbation norm

2:Output: adversarial perturbation of the input, i.e.,  
4:for class-index in range(do
5:     ,
7:     while  do
10:         if  then
12:         else
14:         end if
15:     end while
17:end for
18: and
Algorithm 1 Crafting an adversarial example

Alg. 1 improves two specific drawbacks of FGM. First, FGM is designed to set the scaling factor of the perturbation, i.e., , such that it goes all the way to the edge of a norm ball surrounding the input [8]. However, Alg. 1 uses a bisection search to find the exact value of scaling factor that guarantees the misclassification (within the extent of the constraint on the perturbation norm). Second, in a non-targeted FGM attack, FGM tries to increase , and for a targeted attack FGM tries to minimize just for a specific target class. On the contrary, Alg. 1 searches among all possible targeted attacks and then select the one with the least perturbation required to enforce misclassification. Therefore, Alg. 1 provides fine-grained adversarial perturbations while relying on the computationally efficient FGM as the core of the algorithm.

In the computer vision literature on adversarial attacks, the focus is on finding slight perturbations that a human observer does not even notice, while it causes misclassification. Given Alg. 1, one can think of a similar analogy in wireless applications, perturbations which are unnoticeable (or quasi-unnoticeable) by the receiver. Here we propose two new metrics, the perturbation-to-noise ratio (PNR) and the perturbation-to-signal ratio (PSR), where PNR is the ratio of the perturbation power to the noise power and PSR is the ratio of the perturbation power to the signal power. Note that the signal-to-noise ratio (SNR) is related to PSR and PNR as or equivalently . Given these definitions, we can consider a perturbation (quasi) imperceptible if for that perturbation we have , as the perturbation will be in the same order or even below the noise level.

Fig. 2 presents the accuracy of VT-CNN2 versus PNR, for three different values of SNR. The perturbations are created using Alg. 1. The horizontal dashed lines represent the accuracy of VT-CNN2 when there is no attack. From Fig. 2, it is obvious that when the perturbation is in the same order as the noise (for all three SNR levels), the attack can cause misclassification. Note that, even when the perturbation is one or several orders of magnitude less than the noise level, the attack can significantly reduce the accuracy of the model. This raises a major concern regarding the robustness of DL-based wireless application and reveals their vulnerability to white-box adversarial attacks.

Fig. 2: The accuracy of VT-CNN2 versus PNR, with and without adversarial attack.

V Universal Black-box Attacks for Wireless Communication Systems

In the previous section, we presented a white-box attack while considering three limiting assumptions. First, the attacker knows the exact input. Second, each element of is perturbed by its corresponding element in , i.e., the attacker is synchronous with the transmitter. Third, as we considered a white-box attack, we assumed the attacker has a perfect knowledge of the underlying model, i.e., . In this section, we address these limiting assumptions.

V-a Universal Adversarial Perturbations

Alg. 1 creates input-dependent adversarial perturbations, i.e., given input it generates a perturbation to fool the model. This enforces the attacker to know the input of the model, which is not a practical assumption. Therefore, it is interesting to create adversarial attacks which are input-agnostic. More precisely, instead of , we are interested to find a universal adversarial perturbation

that can fool the model with high probability, independent of the input applied to the model. In the literature on ML and computer vision, such a perturbation is called a UAP


A common method for creating UAP is presented in [11]. The algorithm therein, receives as inputs, 1) the model, 2) the desired norm of the UAP, and 3) a random subset of data inputs, e.g., . Based on these inputs, it generates as output a UAP . The core of the algorithm is an iterative approach that in each iteration requires to generate an adversarial perturbation for each of the data points, e.g., by running Alg. 1 times. Hence, it is computationally expensive.

In this section, we propose a new algorithm for generating a UAP that has a very low computational complexity and also provides a better fooling rate on our dataset compared to [11]

. The algorithm uses principal component analysis (PCA) to craft the UAP. The main intuition behind the algorithm is as follows. Assume we have an arbitrary subset of inputs

, and their associated perturbation directions , where . Now the question is, how one can craft a UAP that contains the common characteristic(s) of ? Noting that to are points in

, if we stack them into a matrix, then the first principal component of the matrix would have the largest variance. In other words, the first principal component will account for as much as variability in

as possible. Therefore, we suggest using the direction of the first principal component as the direction of UAP. The detailed algorithm is given in Alg. 2.

  • a random subset of input data points and their corresponding labels

  • the model

  • maximum allowed perturbation norm

2:Output: a UAP  
3:Evaluate .
4:Compute the first principal direction of and denote it by , i.e., and .
Algorithm 2 PCA-based approach for crafting a UAP

Fig. 3 investigates the performance of Alg. 2. It illustrates the accuracy of VT-CNN2 versus PSR, for our proposed UAP attack, the UAP attack presented in [11], and a jamming attack. For the jamming attack, the adversary creates Gaussian noise, which has the same mean as the data points and same power as the UAP attacks. Note that Alg. 2 provides higher fooling rate than [11]. Moreover, even for very small PSR values the performance of VT-CNN2 drops significantly, e.g., for PSR dB the accuracy drops by half. Also note that the proposed UAP is significantly more powerful than the classical jamming attack.

To emphasize the low computational cost of Alg. 2, we also present Table I, which compares the run-time of Alg. 2 with [11] in seconds, for SNR dB and . All the simulations are performed using TensorFlow on an NVIDIA GeForce GTX 1080 Ti graphic processing unit. Note that [11] requires much more time to craft a UAP as PSR reduces, while Alg. 2 provides a steady and efficient computational performance.

PSR [dB] -10 -12 -14 -16 -18 -20
Time required by [11] 20.5 23.0 25.1 27.2 29.0 30.5
Time required by Alg. 2 0.3 0.3 0.3 0.3 0.3 0.3
TABLE I: Run time of Alg. 2 compared to [11] in seconds, for SNR dB and .
Fig. 3: The accuracy of VT-CNN2 under different attacks.

V-B Black-Box Attacks and Shift Invariant Property of UAPs

In the previous subsection, 1) we assumed the attacker has the perfect knowledge of the model , 2) and it is synchronous with the transmitter, i.e., each element of to be perturbed by its corresponding element in . In the following, we show how an attacker can address these two limitations.

To address the first problem we use the transferability property of adversarial examples [8]. Due to this property, an adversarial example crafted for a specific DNN can also fool other DNNs with different architectures, with high probability [8]. Therefore, to craft a UAP for VT-CNN2, we first create such a UAP for a substitute DNN and then apply it on VT-CNN2. Here we consider a

fully connected multilayer perceptron (MLP) as our substitute DNN and craft a UAP for it.

To address the second problem, we reveal an interesting property of the crafted UAPs, namely, the shift invariant property. More precisely, we show that the UAPs created by Alg. 2 are shift invariant, i.e, any circularly shifted version of them can fool the DNN and cause misclassification.

Fig. 4 shows the performance of two UAP attacks designed using Alg. 2, a white-box UAP attack that has the perfect knowledge of the model, and a black-box UAP attack with random shifts. For the latter case the UAP is crafted for the aforementioned substitute MLP (black-box attack) and then the UAP is randomly shifted (non-synchronous attack). Given Fig. 4, note the following observations. First, the black-box attack is approximately as effective as the white-box attack. Second, any random shifted version of the UAP is nearly as destructive as the original synchronous version, hence there is no need for a synchronous attack. Therefore, Fig. 4 shows that we are able to craft extremely low power UAPs that can cause severe misclassification, while we neither need to know the model of the underlying DNN, nor require a synchronous attack.

Fig. 4: An illustration of transferability and shift invariant properties of the proposed UAP attack.

Vi Conclusion

We considered the use of DL-based algorithms for radio signal classification and showed that these algorithms are extremely susceptible to adversarial attacks. Specifically we designed white-box and black-box attacks on a DL classifier and demonstrated their effectiveness. Significantly less transmit power is required by the attacker in order to cause misclassification, as compared to the case of conventional jamming (where the attacker transmits only random noise). This exposes a fundamental vulnerability of DL-based solutions.

Given the openness (broadcast nature) of the wireless transmission medium, we conjecture that other DL-based signal processing algorithms for the wireless physical layer may suffer from the same security problem.