Mitigation of Adversarial Examples in RF Deep Classifiers Utilizing AutoEncoder Pre-training

02/16/2019 ∙ by Silvija Kokalj-Filipovic, et al. ∙ 0

Adversarial examples in machine learning for images are widely publicized and explored. Illustrations of misclassifications caused by slightly perturbed inputs are abundant and commonly known (e.g., a picture of panda imperceptibly perturbed to fool the classifier into incorrectly labeling it as a gibbon). Similar attacks on deep learning (DL) for radio frequency (RF) signals and their mitigation strategies are scarcely addressed in the published work. Yet, RF adversarial examples (AdExs) with minimal waveform perturbations can cause drastic, targeted misclassification results, particularly against spectrum sensing/survey applications (e.g. BPSK is mistaken for 8-PSK). Our research on deep learning AdExs and proposed defense mechanisms are RF-centric, and incorporate physical world, over-the-air (OTA) effects. We herein present defense mechanisms based on pre-training the target classifier using an autoencoder. Our results validate this approach as a viable mitigation method to subvert adversarial attacks against deep learning-based communications and radar sensing systems.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Intro

A new research direction is emerging in the field of wireless communications, aiming to develop and evaluate deep learning (DL) approaches against classical detection and estimation methods in the radio frequency (RF) realm. Spectrum sensing, especially in the context of cognitive radio, encompasses most of the radio signal detection problems that are being addressed. The approach to DL in the RF domain differs greatly from the common current DL applications (e.g. image recognition, natural language processing) and requires special knowledge of RF signal processing and wireless communications and/or radar, depending on the signal utilization. While research on adversarial examples in machine learning for images has been prolific, similar attacks on deep learning of radio frequency (RF) signals and the mitigation strategies are scarcely addressed in the published work, with only a couple of recent publications on RF

[1, 2]. Adversarial examples (AdExs) are slightly perturbed inputs that are classified incorrectly by the Machine Learning (ML) model [3]. This perturbation is achieved by mathematical processing of the signal, e.g., by adding an incremental value in the direction of the classifier’s gradient with respect to the inputs (as in the FGSM attack illustrated in Fig. 3 A), or by solving a constrained optimization problem. Popular deep learning (DL) models are even more vulnerable to AdExs as DL networks learn input-output mappings that are fairly discontinuous. Consider the images in Figure 1 [4]

. The image on the left is the original image of a panda from the ImageNet dataset

[5], while the one on the right is derived from it by applying an FGSM attack of very low intensity. The perturbation of 0.007 added in the direction of the loss gradient corresponds here to the magnitude of the smallest bit of the normalized 8-bit RGB pixel encoding of the image. This is sufficient to elicit the GoogLeNet [6] to misclasify it as a gibbon. For further details about AdExs, please see seminal work, such as [3, 4]. Likewise for RF, adversarial examples can cause drastic, targeted misclassification results mostly in spectrum sensing/ survey applications (e.g. BPSK mistaken for 8-PSK) with minimal waveform perturbation. However, it is not clear if the RF AdExs maintain their effects in the physical world, i.e., when AdExs are delivered over-the-air (OTA). Our research on deep learning AdExs and proposed defense mechanisms are RF-centric, and incorporate physical-world, OTA effects. In this work we present defense mechanisms based on pre-training deep learning classifiers in the RF domain by an autoencoder (AE) of the matching architecture.

Fig. 1: Famous panda illustration of an adversarial image example against a DL classifier where a visually imperceptible, noise-like perturbation can fool the classifier to label it as gibbon

I-1 Existing Work

The research in the area of RF-based DL of the PHY layer is still embryonic [7]. Modulation recognition (ModRec) is the most popular application of DL here. Most of the existing work is based on convolutional (CNN) architectures [8]. Paper [9] features an in-depth study on the performance of DL ModRec methods on OTA captured RF communication signals synthetically designed in Software Defined Radio (SDR). The paper [9]

demonstrates that in the ModRec context DL provides significant performance benefits compared to conventional feature extraction methods. Apart from exploring optimal DL architectures and comparing their classification accuracy with state-of-the-art performance based on signal cumulants or their cyclo-stationary properties

[10], this paper contributed a publicly available dataset [11], which we use here to demonstrate launching and mitigation of the attacks on DL that leverage adversarial examples of RF data points.

There are many methods to create AdExs. By definition, the following optimization problem describes the general approach:


where denotes the norm, and is the decision rule by the NN with parameters evaluated at

The final constraint is somewhat arbitrary: it means that the adversarial example still belongs to the same space as the legitimate data point. Most current attacks are based on the gradient of a neural network’s loss function: White-box attacks use the target NN to compute the gradient; Black-box attacks use a surrogate network to approximate the gradient. We will be using Fast Gradient Sign Method (FGSM)

[4] to illustrate our ideas. This attack takes the sign of the gradient and moves the data point one step in that direction.

where is the legitimate data point, and is its adversarial example. is the loss function for input evaluated for the targeted label. We denoted the targeted (adversarial classification) label as

which can be constant, or a random value, and the hyperparameter

is usually a small number to limit the perturbation (note its value, in Figure 1).

FGSM is a simple attack and, at the same time, a basic principle used in iterative attack methods and constrained-optimization-based methods, hence representing a good reference for evaluation of new defense approaches. Some common iterative methods based on FGSM include: Basic iterative method (multiple steps of FGSM) [12], Carlini-Wagner method [13] (similar but modified objective function), Projected Gradient Descent (add noise, compute gradient, step, project back) [14].

Ii Adversarial Examples: Problem Statement

Consider this scenario for an OTA attack on an RF DL classifier as a motivating example (Fig. 2). The DL attacker (DLA) is at the transmitter of a communication system. Both DLA and the attacked system (AS) are using software defined radios (SDRs), although the DLAs receiver does not have to be based on an SDR. The DLAs goal is to elicit adversarial classification at the AS on the signal designed based on a legitimate signal of class . Despite the adversarial modifications targeted to elicit classification different from

, the designed AdEx needs to maintain high probability of being decoded as

at the attackers own intended receiver, hence the perturbation of is constrained.

The AS is sensing the spectrum in order to perform reactive jamming, e.g., to jam BPSK modulated waveforms that are part of traffic signaling (preambles, control-plane packets), which if corrupted makes the rest of communication meaningless. Note that many preambles are BPSK-modulated, as well as the packets in the control plane of most protocols (such as acknowledgments), and if the they get corrupted by jamming the whole data packet is lost [15]. Hence, the DLA would want to create the AdEx that disguises BPSK as QPSK (or other) to avoid the EW attack by the DL-based reactive jammer (i.e., AS) [16]. Note that reactive jamming is very difficult to detect [17], but it heavily relies on the inference based on spectrum sensing. If the inference is adversarially attacked, the jammer will be mitigated (by failing to reactively transmit a jamming signal). We here take the side of the jammer in attempting to detect such adversarial attacks.

In the case of images the slight perturbation applied to a legitimate data example is expressed as visual imperceptibility by a human viewer. RF adversarial examples is a nascent field, and as such the definition of imperceptible perturbation does not exist in the literature. For RF signals utilized for communications we define the imperceptible perturbation as any deformation of the RF waveform that can be filtered out by a receiver, e.g. via matched filters or correction codes, such that the bit-error rate is close to that of legitimate signals. An analogous definition can be made for radar using receiver operating characteristics (ROC).

Mitigating adversarial inputs remains an open problem, even in the image-domain. A complicating factor in detecting AdEx attacks include variations due to the physical world. Visual adversarial perturbations and their robustness given different backgrounds, lighting, and camera resolutions is discussed in [18]. The diversity of RF communications, radar, and spectrum sensing systems, and complex propagation channels makes this problem in the RF domain even more complex and unique. The effect of the channel (or interference) which is mitigated in well-designed communications receivers will persist at the DL classifier, thus changing the classification for both legitimate inputs and AdExs. Although we reflect on these issues, the proper consideration and modeling of the both hardware and the RF channel are outside the scope of this paper.

Fig. 2: Motivating scenario for an OTA attack to DL classifier via RF AdExs

The defense mechanisms that we are proposing here rely on pre-training the DL classifier using an autoencoder. The idea of pre-training the classifier using an autoencoder is not new [19] but our implementation and evaluation of this method is unique and crafted for RF signals. Some features of the RF waveform are corrupted during OTA delivery unintentionally, some are changed due to an adversarial attack. The autoencoder is expected to filter out non-salient features, which may lower the accuracy of classification, but also make it more robust to the adversarial and physical corruption. Common approaches to defense against adversarial attacks (mainly evaluated on image datasets) include: Gradient masking - hiding the gradient; Preprocessing - trying to “undo” the perturbations; Detecting AdExs - looking for distribution shifts; Certified methods - proving immunity to a set of perturbations. For details please see [20] and references therein. Preprocessing is the closest of these strategies to the autoencoder-based method since the autoencoder operates by projecting the inputs into a space of lower dimensionality and filtering out the adversarial perturbations.

Finally, adversarial training of neural networks is another common defense, and often complimentary to other methods. It consists of generating the AdExs according to one or more attack methods, and retraining the NN with labeled AdExs. We use adversarial training in conjunction with other mitigation and defense methods.

There are some complexities in DL of RF signals that we would like to highlight since our approach to solving those complexities impacts the presented results. Raw RF signal data is complex-valued, and traditionally split into the in-phase (I) and quadrature (Q) channels, resulting in a series of samples. Standard DL networks are not designed to handle complex-valued data, hence we must apply a transform to the real domain that preserves salient signal information. Our prior research leveraged expert feature transforms (e.g. FFTs, wavelets) to optimize performance by reducing the complexity (e.g. number of NN parameters) of the proceeding network. For this research we used interleaved I/Q samples of the DeepSig dataset comprised of synthetic data points from 24 modulation classes [11]. This simple transform from complex to real set, could be expressed as follows: for a data point of I/Q samples , where

the transformed vector has

real elements For the Deepsig dataset that we used

hence the input to the NN is a tensor of dimensions

Note that despite the conversion from complex data to the interleaved I/Q transform the accuracy of classifying four modulations represented by a subset of the DeepSig dataset gets close to 100% if data points with  dB are used (see Figures 5 and 8). Adversarial examples with lower the accuracy by 30% or more on average.

A: Basics of AdEx design B: Effect on a QPSK signal amplitude

C: Effect on a QPSK data point (in-phase)

Fig. 3: FGSM Attack (A) and its effect on a modulated RF signal (B) and its data point (C)

The bottom (C) of Figure 3 shows 100 in-phase samples of a QPSK adversarial example (red) and its legitimate counterpart (blue). Similar effect is observed for the quadrature component. Neither I nor Q samples visually change much for a small perturbation (FGSM with ). However, the modification induced on the signal amplitude (top right - B) is more pronounced, and depending on the value this may have other effects at the receiver. Note that the B plot shows 25 amplitude samples (from 50 samples of the data point).

Iii Proposed Method for Mitigation of Adversarial Examples

All the results presented here are based on a subset of the DeepSig dataset – specifically BPSK, QPSK, 8PSK and 16QAM (DeepSig classes 3, 4 ,5, 12). We applied the FGSM attack using the CleverHans library (12). We compared results with and without the attack using Auto-encoder (AE) based training and conventional training of the convolutional neural network (CNN) presented in Figure 


Fig. 4: Architecture of the 1D-CNN classifier that was both classically trained and pre-trained by autoencoder; x = 4, y = 256
Fig. 5:

Accuracy (over training epochs) of a 1D-CNN classifier when classically trained, vs pre-trained by autoencoder, for both legitimate data and FGSM AdExs

Fig. 6: AE-based training

We conducted the AE based training by training the AEs encoder of the same architecture as the CNN presented in Figure 4, and then transferring the weights to the CNN classifier. The architecture of the AE consisted of such an encoder (red in Figure 6), and the decoder (green in Figure 6 ), which is the encoder’s mirror image. Notice from Figure 4

that the encoder consisted of several blocks of 1-D convolutional and max-pooling layers, with layers’ widths narrowing down from 2048 to 256 neurons. Mirroring replaces 1-D convolutions with deconvolutions, and max-pooling with the matching upsampling. The process of training the classifier based on the AE training is shown in Figure 

6, where the AE is trained to minimize the mean-square error (MSE) distance between the input and output.

Figure 5 plots the classification accuracy of the 4 modulations with and without an FGSM attack (), utilizing dashed lines for the AE-trained classifier (red for legitimate data, blue for adversarial). Note that the legitimate accuracy of the AE-trained classifier is fixed, as Figure 5 (and Figure 5 too) plots the accuracy over the training epochs of the classically trained classifier (i.e., once the AE-based classifier is already trained). Although this is hard to see for the AE-based network, this kind of plotting makes the adversarial accuracy for both DL networks non-constant since adversarial examples are created at each training epoch based on the current loss function. Nevertheless, Figure 5 shows how the AE training makes the network more resilient. This is for the FGSM (), but similar results are observed for other attack methods. In addition, in Figures 8 to 10 we present how the accuracy of each of the classified modulations is affected. Figure 8

shows the confusion matrix for the unattacked CNN network, and Figures 

9 and 10 present confusion matrices for the FGSM attacked CNNs, trained conventionally and by AE pretraining, respectively.

The adversary can only assess the deployed network. He would not know that the encoder layers have not been trained on this network, and it approached the AdEx design in a classical way by utilizing the loss function dependent on the trained weights, which are know to him. This makes his attack a white-box attack by definition, as the adversary knows the network weights, the number and type of layers, the number of convolutional channels and size of convolutional kernels. An interesting effect is presented in Figure 7

when a grey-box attack is performed, which is the weaker attack when everything else but the actual weights are known. In simpler terms, the adversarial examples are created on an independently classically trained network, but applied to the AE trained network. The AE-trained network still maintains resilience against AdExs but the original network performs slightly better for legitimate examples. This promising AE-based defense can be further improved by drawing on our research on channel-robust Stacked Denoising Autoencoder (SDAE).

Fig. 7: Accuracy (over training epochs) of a 1D-CNN classifier when classically trained, vs pre-trained by autoencoder, with a grey-box FGSM attack
Fig. 8: Confusion matrix of unattacked CNN trained on classes 3,4,5 and 12 (BPSK, QPSK, 8-PSK,16QAM)
Fig. 9: Confusion matrix of the FGSM-attacked classically trained CNN for classes 3,4,5 and 12 (BPSK, QPSK, 8-PSK,16QAM)
Fig. 10: Confusion matrix of FGSM-attacked AE-pretrained CNN trained on classes 3,4,5 and 12 (BPSK, QPSK, 8-PSK,16QAM)
legitimate adversarial
Fig. 11: Classically trained classification of 3 modulations (BPSK, QPSK,8-PSK) shows very different distribution of output probabilities between legitimate and adversarial examples after 40 training epochs
legitimate adversarial
Fig. 12: AE-pretrained classification of 3 modulations (BPSK, QPSK,8-PSK) shows very different distribution of output probabilities between legitimate and adversarial examples after 40 training epochs

Iii-a How the AE changes the separating hyper-planes

We have seen from Figures 5, 8, 9, 10

that the AE pretraining significantly reduces the effect of FGSM adversarial examples. The question that arises is whether we can combine this kind of mitigation of the attack with the defense methods based on detecting and discarding adversarial examples. To address this we perform a statistical test that compares adversarial and legitimate examples in two experiments: 1) when the CNN network is not defended, and 2) when it is defended by the AE-based pretraining. The test that utilizes the output of the Softmax layer is motivated by Figures 

11 and 12. Note that we refer to the class probabilities computed by the Softmax layer as the outputs of the classifier:


where is the set of classes that we perform the inference on, and and are the weight and the bias of the Softmax layer for class . is the input to that layer for each data point . For the sake of visualization, both figures are based on the 3-class classifier trained on BPSK, QPSK and 8-PSK modulation data points. Hence, the outputs of the Softmax layer are 3-dimensional vectors that are plotted in the figures for all data points utilized for training (close to 20,000, shown on the left), and for their adversarial examples, shown in the plot on the right. The elements of the vectors are values between 0 and 1, representing the probabilities of the classes (2). Figure 11 shows these vectors after 40 epochs of training the CNN network conventionally, which is upon the convergence of the loss function and after the achieved accuracy exceeded 99%. Figure 12 shows the same after the AE-trained network has converged to its optimal performance.

It is easy to see that the AE training changes the distribution of legitimate, and especially adversarial outputs. Let us observe first that in Figure  12-left, representing legitimate examples, the BPSK-classified data points (purple) cluster in the area close to the vertex (1,0,0), denoting the probability of 1 for BPSK and 0 for QPSK and 8-PSK. Similarly, the points classified as QPSK and 8-PSK cluster around the vertices corresponding to probability of 1 for QPSK and 8-PSK, respectively. The left-hand side of Figure 11 shows the similar effects for conventionally trained network, although the outputs for each classification are more smeared, which matches the accuracy plot in Figure 5. The right-hand side of Figure 11 shows that the output vectors for the adversarial examples of a conventionally trained CNN are distributed across a wide range of values, and the clusterization effect is lost. The same plot on the right-hand side of Figure 12

shows less variance in the adversarial outputs, i.e., they are projected along a couple of lines.

Iii-B Kolmogorov-Smirnov Test for Output Layer Probabilities

This Kolmogorov-Smirnov (KS) two-sample test (see [21] and references therein) is performed on the two sets of vector outputs of the classifier. We performed each of the tests for the two experiments described above - one with outputs from conventionally trained CNN network, and another with the AE-pretrained outputs. Columns of the tables in Figure 13 show for each experiment the 3 instances of the 2-sample KS test between the outputs: 1) Entire legitimate output dataset per class vs. entire adversarial output dataset per predicted class; 2) A random set of 50 legitimate output vectors of the same class vs a random set of 50 same-class adversarial output values 3) Control instance (legitimate to legitimate outputs, per class), with a random set of 50 outputs each.

The KS test declares the confidence (p-value) that the two sets of statistics are from the same distribution. Tables in Figure 13 display those confidence values for the 3 instances described above. Small confidence in the 1st and 2nd column show that AdExs are not drawn from the same distribution as the original data, and can thus be detected using this test. The 3rd column quantifies the confidence in such a claim, with one being the highest confidence. We see that Table 2 (AE-trained classifier) has much lower confidence in the outcomes of statistical tests, which is expected. This is especially true for small size statistics (50 samples), i.e., if we want to conduct the test in real time, and for a higher order modulation. Obviously, the test design should be more sensitive when used along with mitigation strategies, and/or a belief network must be used to adapt the decision based on other outcomes. For additional information regarding the KS-based tests aimed to detect and discard RF adversarial examples please see out prior work in [2]. The KS test based on the output probabilities will likely show different results when evaluated at a classifier that is trained on clean RF samples but receives the data points OTA, which causes a different distribution shift due to the channel effects. We plan to evaluate these effects in future work.

   Table 1         Table 2
Fig. 13: KS-test results, based on output probabilities (2), for experiment with classically trained CNN (left), and another with AE-trained CNN (right)

Iv Conclusion

We showed that pre-training deep learning classifiers in the RF domain by an autoencoder (AE) mitigates the deceiving effect of adversarial examples (AdExs). The classifier that we designed for evaluation of this defense method is based on several 1-dimensional convolutional and max-pooling layers, and two regularized dense layers at the bottom of the network. The classification accuracy of the trained network was satisfactory on the legitimate dataset, which consists of four differently modulated RF signals. Despite the improvements due to AE-based pretraining, there is some residual decrease in the accuracy of the attacked classifier that should be addressed by different methods. We intend to address this in our future work by expanding the AE-based defense to a denoising AE, which is also likely to increase its robustness against the receiver noise, i.e., unintentional input corruption. We also explored if we can combine this kind of mitigation of the attack with the defense methods based on detecting and discarding adversarial examples. We show that detection methods based on the statistical tests to detect a distribution shift of the values at the output of the DL classifier are not as effective as when applied to an undefended classifier. This should be considered when the detection and mitigation by pretraining are combined to strengthen the classifier robustness to adversarial attacks. The validity of the proposed defense should be verified in terms of robustness to corruption incurred at the receiver due to over-the-air delivery of RF data points, which we plan to evaluate in future work.


  • [1] M. Sadeghi and E. G. Larsson, “Adversarial attacks on deep-learning based radio signal classification,” arXiv preprint, 2018.
  • [2] S. Kokalj-Filipovic and R. Miller, “Adversarial examples in RF Deep Learning: Detection of the attack and its physical robustness,” arXiv preprint, 2019.
  • [3] Christian Szegedy et al., “Intriguing properties of neural networks,” in Intern. Conf. on Learning Representations, 2014.
  • [4] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint, 2014.
  • [5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR09, 2009.
  • [6] Christian Szegedy et al., “Going deeper with convolutions,” in Computer Vision and Pattern Recognition (CVPR), 2015.
  • [7] T. OShea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Transactions on Cognitive Communications and Networking, 2017.
  • [8] T. J. OShea, J. Corgan, and T. C. Clancy, “Convolutional radio modulation recognition networks,” in Int. Conf. on Engineering Applications of Neural Networks, 2016.
  • [9] T. OShea et al., “Over the Air Deep Learning Based Radio Signal Classification,” IEEE Journal of Selected Topics in Signal Processing, 2018.
  • [10] C. M. Spooner, A. N. Mody, J. Chuang, and J. Petersen, “Modulation recognition using second-and higher-order cyclostationarity,” in Int. Symp. on Dynamic Spectrum Access Networks (IEEE DySPAN), 2017.
  • [11] D. Inc, “Deepsig Inc. RF DATASETS FOR MACHINE LEARNING,” 2018, accessed on 11/20/2018. [Online]. Available:
  • [12] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprint, 2016.
  • [13] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE Symp. on Security and Privacy (SP), 2017.
  • [14] A. M. at al, “Towards Deep Learning Models Resistant to Adversarial Attacks,” arXiv preprint, 2017.
  • [15] R. D. Miller and W. Trappe, “On the vulnerabilities of CSI in MIMO wireless communication systems,” IEEE Trans. Mob. Comput., vol. 11, no. 8, 2012.
  • [16] M. Wilhelm et al., “Short paper: Reactive jamming in wireless networks: How realistic is the threat?” in the Fourth ACM Conference on Wireless Network Security, ser. WiSec ’11, 2011.
  • [17] W. Xu, W. Trappe, Y. Zhang, and T. Wood, “The feasibility of launching and detecting jamming attacks in wireless networks,” in 6th ACM Int. Symp. on Mobile ad hoc networking and computing (Mobihoc), 2005.
  • [18] I. Evtimov et al., “Robust physical-world attacks on deep learning models,” arXiv preprint, 2017.
  • [19] I.-T. Chen and B. Sirkeci-Mergen, “A comparative study of autoencoders against adversarial attacks,” in Int’l Conf. IP, Comp. Vision, and Pattern Recognition, 2018.
  • [20] A. Raghunathan, J. Steinhardt, and P. Liang, “Certified defenses against adversarial examples,” in Int. Conf. on Learning Representations, 2018.
  • [21] Kolmogorov–Smirnov Test.   Springer New York, 2008.