The most exciting advancement of artificial intelligence in the past decade is the wide application of deep learning techniques
. However, a recent research discovered that machine learning models (including deep neural networks, a.k.a. DNN) are susceptible toadversarial attacks, which apply small perturbation on input data to fool the models. Such attacks normally lead to a lower confidence level or even a misclassification [2, 3]. In addition, Papernot et al. discovered that a perturbed example (i.e., adversarial example) has transferability property: the adversarial example crafted by a model (i.e., substitute model) not only can deceive itself but also can influence other models (i.e., victim models), even without knowing the internal structures of these victim models . The amplitude of the perturbation that is used in adversarial attacks (a.k.a. adversarial strength) can be quite small or even imperceptible to the human eyes. All these properties raise severe concerns on the security of deep learning technique.
In this work, we revisit the recently emerged adversarial training process that uses adversarial examples to train DNN and therefore enhances its resilience to adversarial attacks. We find that the adversarial examples with different adversarial strengths work effectively only for the adversarial attacks with certain range of adversarial strengths. Based on this observation, we propose a multi-strength adversarial training (MAT) method that defends adversarial attacks by combining adversarial training examples with multiple adversarial strengths. Two adversarial training structures, namely, mixed MAT and parallel MAT, are developed to integrate the influences of multiple adversarial strengths with different training times and hardware costs. The proposed adversarial training structures are also implemented on Xilinx FPGA ZC706 board to evaluate their performances, hardware costs, and energy consumptions and to explore the possible design space. Compared to the existing works about adversarial attacks and their defense schemes, our major contributions can be summarized as follows:
We identify the limitation on the working range of the existing adversarial training technique against adversarial attacks;
We invent multi-strength adversarial training (MAT)—the first adversarial training method that can enhance the resilience of learning systems over a controllable wide range of adversarial length under adversarial attacks;
We propose mixed MAT and parallel MAT to facilitate a flexible tradeoff between training time and hardware cost;
We implement a random walk algorithm to optimize the selections of adversarial strengths and other design parameters in the MAT process; and
We also implement MATs with different configurations on a FPGA platform and discuss the tradeoffs between training time, robustness, and hardware cost.
Our experimental results on MNIST, CIFAR-10, CIFAR-100, and SVHN show that both mixed MAT and parallel MAT can better defend the learning model under adversarial attacks than single-strength adversarial training and parallel MAT offers the largest accuracy improvement. The results also show that the model robustness is greatly affected by the complexity and size of the network structure, the training dataset, the associated hardware cost, and training time.
The remainder of this paper is organized as follows: Section II gives preliminary about adversarial attacks and its defense techniques; Section III introduces the motivation of our work; Section IV presents the details of our proposed method; Section V shows the experimental results and discussions; Section VI concludes this work.
A robust learning model is expected to be able to tolerate random noises in input samples to certain degree . However, recent studies showed that the robustness of a learning model is threatened by so called adversarial attacks . To be specific, a learning model may misclassify an example that is carefully perturbed, say, an adversarial example, to a wrong class. An adversarial example can be generated by injecting perturbation to the original input sample such as
. The linear transformation of
with respect to a given weight vectoris
Here the adversarial perturbation is often referred to as adversarial strength. increases proportionally with the dimensionality of . Since is often high dimensional in practical problems, a minor perturbation could introduce a big change of . Such a small change may not be caught visually or by a rule-based detection scheme but could be sufficient to lead a misclassification in DNNs.
Some popular methods to defend adversarial attacks are:
Gradient masking: Its main idea is to build a model to hide or smooth the gradient between original and adversarial examples . The effectiveness of this method, however, can significantly degrade when the attacker uses a model different from the protected model to generate the adversarial examples.
The target of this method is to create a model whose decision boundaries are smoothed along the directions that the attacker may exploit. Defensive distillation makes it difficult for the attacker to discover adversarial input tweaks that lead to incorrect classes.
Adversarial training: The objective of adversarial training is using adversarial examples to train the model and therefore enhance its resilience to adversarial attacks. Its effectiveness has been shown and explained by Goodfellow et al. . Note that in adversarial training, the model that is used to generate the adversarial examples is not necessarily identical to the model being attacked.
In some recent relevant research works, Kurakin et al.  show that combining small batches of both adversarial examples and original data in adversarial training could make the model more resilient to adversarial attacks. Carlini and Wagner  demonstrate that defensive distillation does not significantly enhance the robustness of neural networks in some scenarios by introducing three new attack algorithms. Cisse et al.  introduce a layer-wise regularization method to reduce the neural network’s sensitivity to small perturbations, which are difficult to be visually caught. However, these work do not give experimental guidance on how to adaptively select the adversarial strength of the adversarial examples that are used in adversarial trainings or attacks to maximize either the defense or attack effectiveness. And to the authors’ best knowledge, there is no work discusses the implementation on hardware and hardware consumption.
It is known that training the DNN using adversarial examples with certain adversarial strength helps in improving the DNN’s resilience against adversarial attacks. Figure 1 compares the robustness (overall accuracy performance in the interested adversarial strength range) of a 6-layer neural network trained with different datasets on MNIST. Here the original model is trained with the original dataset, and the models of adv_5, adv_10, and adv_15 are trained with half legitimate examples and half adversarial examples with the adversarial strength = 0.05, 0.10, and 0.15, respectively. As we can see from the results, the model trained with the original dataset is very susceptible to the increase of adversarial strength during the adversarial attacks. However, including the adversarial examples in the training process can effectively maintain the model’s accuracy when the adversarial strength increases, which aligns with the result in Goodfellow et al. .
In addition, Figure 1 shows that the accuracies of the models trained with different adversarial strengths cross each other over the simulated range of the adversarial strengths adopted in the attacks. As we can see from the figure, these models demonstrate different defending effectiveness on different adversarial strength ranges. On the left side of point a, for example, the model adv_5 demonstrates the highest accuracy among all the trained models. As the adversarial strength adopted in the attacks increases, the models adv_10 and adv_15 give the highest accuracy in turn. This observation in fact reveals a limitation of the single strength adversarial training: each adversarial strength of the samples used in the adversarial training has its best working range, say, around the same strength that is adopted in the attacks.
By leveraging the different working ranges of the different adversarial strengths used in adversarial training, there exists a possibility to develop a new adversarial training method that can combine these working ranges so that the trained learning system can be resilient to the attacks over a wide range of the adversarial strength.
Figure 2 shows our initial simulation results of the accuracy of the models adversarially trained with different configurations of the training dataset under the adversarial attacks with different strengths. Similar to Figure 1, here model original represents our baseline, which is trained with the original dataset. Model single-strength is trained with half of the original dataset and half of the adversarial examples generated by Fast Gradient Sign Method (FGSM) . Model mixed multi-strength: reduced size is trained with a mixed combination of the original dataset and the adversarial dataset with the strengths of 0.05, 0.10, and 0.15, respectively. The size of each subset of the training data is 25% of the original dataset size. Model mixed multi-strength: full size has the same partition of all datasets as mixed multi-strength: reduced size does, but the size of each subset of the data is the same as that of the original dataset. Our simulation results show that model mixed multi-strength: full size performs the best over a considerably large range of the adversarial strength, indicating a great potential of our proposal to combine the adversarial examples with different strengths for robust model training. As we shall show later, the selections of the adversarial strength levels and the number of strengths are critical in our proposal. The results also show that packing the data with different adversarial strengths into the same size of the original dataset, i.e., as model mixed multi-strength: reduced size, may not help much to enhance the model’s resistance and may even degrade the model accuracy. A possible mathematic explanation about why combining multiple adversarial strengths during the adversarial training helps in improving the resilience of the model to the adversarial attacks is related to the construction of the decision hyperspace during the training. Due to space limit, we do not include this explanation in this paper.
In light of the limited working range of single-strength adversarial training, in this work, we propose multi-strength adversarial training (MAT) to combine the effects of multiple adversarial strengths to improve the robustness of the neural network over a wide adversarial strength range under adversarial attack. Adversarial examples with different adversarial strengths are mixed with the original training dataset in MAT. The total size of the new training dataset for MAT, hence, becomes . Here is the size of the original training dataset. is the size of the generated adversarial dataset with a certain adversarial strength. denotes the number of the different adversarial strengths that are adopted in MAT.
Two training structures, namely, mixed multi-strength adversarial training (mixed MAT) and parallel multi-strength adversarial training (parallel MAT), are proposed to facilitate the tradeoff between the training time and the hardware cost of MAT. Some automated optimization method, e.g., one-dimensional random walk algorithm, can be used to select the optimum adversarial strengths adopted in MAT. The details of these techniques will be explained in this section.
Iv-a Mixed MAT
Figure 3(fig:MMAT) illustrates the training structure of mixed MAT. The new training dataset , which includes the original training dataset and generated adversarial datasets, are numbered from to as the training input of the neural network. Here number
represents the original training dataset. A modified loss function of mixed MAT can be constructed as:
Here is the -th original example; stands for the -th target value; is the function of adversarial transformation. The new loss function contains two terms, which respectively represent the original training part and the newly-added adversarial training part with total different adversarial strengths. The interior sum of the second term denotes the loss on every single strength adversarial dataset, while the exterior sum denotes the overall loss of all the adversarial datasets with different adversarial strengths.
Assume that the sizes of the original and the adversarial dataset are identical, then the total training time of mixed MAT will be times of the original training time of the neural network. During practical applications of mixed MAT, however, the size of the used datasets may be reduced from that of the original dataset to save the training time, by paying the cost of possible model accuracy degradation. Nonetheless, mixed MAT affects neither the computational complexity nor the execution time of the testing process of the neural network. The network structure is not changed either.
Iv-B Parallel MAT
We need to point out that directly combining the adversarial datasets with different adversarial strengths, i.e., in mixed MAT, is not the only option to leverage the different working ranges of these datasets. A more straightforward thinking is that for a specific range of the adversarial strengths adopted by an adversarial attack, we shall always train the neural network with the adversarial dataset that ensures the highest model accuracy under the attack, as illustrated in Figure 1. Following this philosophy, we propose parallel MAT, the concept of which is illustrated in Figure 3(fig:PMAT).
In parallel MAT, total neural network copies can be trained in parallel. Each of these copies is trained with the combination of the original dataset and one of the adversarial datasets with a certain adversarial strength. The overall size of the new training dataset, hence, becomes times of the original dataset. Accordingly, the loss function of the -th () neural network copy is modified to:
Different from Eq. (2) where the effects of the original dataset and all the adversarial datasets are taken into account as a whole, Eq. (3) particularly focuses on the robustness enhancement of the neural network over the working range of a single adversarial dataset. The outputs (e.g., the loss functions or the predicted possibilities) of all the neural copies are then summarized in an upper-boundary decision unit (UDU), as shown in Figure 3(fig:PMAT). The function of UDU is to collect the outputs from all the DNN copies and then decide the classification result by a voting process such as:
where is the coefficient of voting for the -th neural network copy and satisfies . In the implementation of parallel MAT, can be learned using a shallow neural network.
Compared to mixed MAT, parallel MAT reduces the total training time of the DNN by leveraging the computation parallelism. However, the hardware implementation cost may significantly increase in parallel MAT by replicating the neural network. We note that in practice, the optimal numbers of adversarial strengths adopted by mixed MAT and parallel MAT are not necessarily the same.
Iv-C Multi-strength Selection
We utilize one-dimensional random walk algorithm  to automatically select the adversarial strengths adopted in MATs. A random walk is a stochastic process that describes a path formed by a succession of random steps on some mathematical space. The one-dimensional random walk algorithm used in this paper includes the following three procedures:
Pre-computation: An accuracy matrix that contains the average single-strength training accuracies in an adversarial strength range is measured in the victim model and provided to the random walk function. Here we use validation accuracy to approximate test accuracy by assuming that the victim model cannot access the test dataset.
Initialization: Based on the accuracy matrix , the multi-strength accuracy matrix
can be estimated as, where denotes the sum of the -th row of matrix . According to , we can calculate the state transition matrix when walking from one single-strength state to another and initialize the multi-strength accuracy estimation function .
Simulation: During the iterative simulation, we perform -step random walk for times to estimate . To limit the total number of the selected adversarial strengths, a penalty term with an coefficient is added to calculate the multi-strength accuracy estimation by using random walk function, such as .
After exercising sufficient steps of random walk, will give the best estimated accuracy that corresponds to an optimal combination of multiple adversarial strengths represented by . Such a selection method can be used in both mixed MAT and parallel MAT, though they might start with different accuracy matrices.
V Experimental results
In this section, we compare mixed MAT and parallel MAT with the single-strength adversarial training on four image datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN. MNIST consists of digits, and CIFAR-10 and CIFAR-100 consists of natural scenes with different class numbers. SVHN is similar to MNIST where each image is a street view house number. In addition, we also implement these adversarial training schemes with different configurations and discuss the relevant tradeoffs.
V-a Experimental Setup
We use Fast Gradient Sign Method (FGSM) to craft both training and testing datasets. The detailed model structure and setting are described as follows:
Upper bound of adversarial strength: We use Mean Structural SIMilarity (MSSIM) index  to limit the maximum value of the adversarial strength . MSSIM is a value between -1 and 1 to measure the similarity between two images and can be also used to describe the distortion of an image. In this work, we set lower bound of the MSSIM between 0.77 and 0.82, which corresponds to a distortion that can be visually captured.
MNIST: We use LeNet-5 with an accuracy of 99.5% to craft adversarial examples. The range of adversarial strengths is determined upon the following observation: by testing adversarial examples with different adversarial strengths, we find that as increases from 0 to 0.09, 0.15 and 0.30, the corresponding MSSIM decreases from 1.00 to 0.94, 0.90 and 0.82. It implies will be enough for our evaluation. Similar criteria are applied to other datasets.
CIFAR-10, CIFAR-100, and SVHN:
CIFAR-10 and CIFAR-100 use the same model to craft adversarial examples. The model contains 3 convolutional layers and 2 fully connected layers. Each convolutional layer is followed by a batch normalization layer and a max pooling layer. The original accuracies of CIFAR-10 and CIFAR-100 are 82.7% and 54.3%, respectively. The adversarial CIFAR-10 and CIFAR-100 examples are generated in the adversarial strength range of, with a step size of 0.5. SVHN follows the same procedure as that of CIFAR-10 and CIFAR-100 in crafting adversarial examples.
For victim model, we use a 3-layer multilayer perceptron (MLP) for MNIST dataset, a convolutional neural network (CNN) with 3 convolutional layers for CIFAR-10, CIFAR-100, and SVHN. Here no data augmentation method (e.g., cropping or mirroring) is used in the training process. We test every method on full adversarial test dataset in the concerned adversarial strength range as aforementioned; In every adversarial attack, we assume the test examples are perturbation with the same adversarial strength.
V-B Technology Comparisons
Figure 4 compares the effectiveness of four training schemes on enhancing the model’s resilience to adversarial attacks, including the original data training, the single-strength adversarial training, and our proposed mixed MAT and parallel MAT. For each sub-figure, the horizontal axis is the testing adversarial strength, and the vertical axis is the model’s testing accuracy. Each of them corresponds to one dataset, i.e., (a) MNIST, (b) CIFAR-10, (c) CIFAR-100, and (d) SVHN.
Single-strength vs. Original.
Single-strength adversarial training achieves a better accuracy performance than the original data training on adversarial examples, but has a lower accuracy on the original examples. Adversarial perturbation can be understood as a specified distortion of the original examples. Introducing random distortions or other data augmentation methods in the training examples usually improves the training accuracy because of the expansion of sample spaces. However, in single-strength adversarial training, the adversarial examples push the decision boundary towards other classes and may lead to misclassification of some of the originally correctly-classified samples.
Mixed/Parallel MAT vs. Single-strength. For all datasets, parallel MAT generally achieves the highest accuracy over the simulated adversarial strength range, and outperforms single-strength adversarial training. Mixed MAT, however, demonstrates a higher accuracy than single-strength adversarial training only over limited range of the adversarial strength on most datasets. Both parallel and mixed MAT select the adversarial strengths using random walk. The reason that parallel MAT substantially outperforms mixed MAT is possibly because the network structure of mixed MAT is not capable of learning original and adversarial information very well simultaneously.
V-C Performances on Different Datasets
MNIST and CIFAR-10 have similar amount of training and testing examples but CIFAR-10 is more complex than MNIST as CIFAR-10 has RGB channels. As shown in Figure 4(fig:MNIST), for MNIST, single-strength adversarial training is able to compensate the accuracy loss decently. For CIFAR-10, the baseline accuracy of the model trained with the original data is 77.96%. Following the increase of adversarial strength, the model accuracy drops steeply, e.g., down to 43.34% when is merely 5. However, when single-strength adversarial training is applied, the accuracy is restored back to 65%70% within the simulated range. For SVHN, both MATs demonstrate very impressive capability to enhance the model’s resilience to adversarial attacks. The relevant results are summarized in Table I.
|Training Methods||Original||Single-strength||Mixed MAT||Parallel MAT|
V-D Design Exploration
We can reduce the cost of the parallel MAT down to the same level as mixed MAT has by using simplified network structure on each network copy. The accuracy result of one example of such a reduced structure is presented in Figure 4(fig:CIFAR-10) as “parallel MAT: reduced structure”. As can be seen from the figure, the parallel MAT with reduced-structure can still achieve an overall higher accuracy than the original training, but it fails to match the results of the two MATs (with full structures). This result implies a possible tradeoff between accuracy and hardware cost in parallel MAT design, which will be further discussed in Section V-E.
Moreover, Figure 4(fig:SVHN) (SVHN) shows that mixed MAT with reduced training data size shows much worse accuracy than mixed MAT with the full training data size, which echoes the result of Figure 2 in Section III. We note that for parallel MAT, it is not sufficient for each network copies to learn enough information if we further reduce the training data size. Therefore, this option is beyond our consideration in design exploration.
V-E Implementation on FPGA Platforms
|# of Operations||Size||LUT||FF||BRAM||DSP|
To give a more specific understanding on different models, we evaluate the hardware implementation cost of different designs based on their corresponding FPGA realizations that are designed with Vivado HLS 2016.4. This tool initializes the implementation with C language and then exports the RTL as an IP core. Fast C/RTL co-simulation is used for design space exploration and hardware resource estimation. The design is deployed on a single FPGA and uses DRAM as external storage. We use systolic arrays of uniformed processing elements (PEs) as the main computing units with 32-bit floating-point precision. The global control unit initializes the accelerator and distributes kernel weights and feature maps to PEs at runtime. The data from/to the external memory is handled by a multi-port DMA streaming engine. Each PE is assigned with a subset of the overall computation, the PE controller sets up registers according to the received configuration instructions, and then enables Data Fetcher to load vector arrays of an input feature map into on-chip buffer at runtime. The PE also integrates ReLU activation and Pooling function. After placement and routing, the chip operates at 150 MHz.
Table II summarizes the resource utilization of different network structures trained with CIFAR-10 on Xilinx ZC706 development board. “Reduced” in the table indicates “parallel MAT: reduced structure” as in Figure 4(fig:CIFAR-10). The comparison between different models in terms of accuracy and hardware cost shows that the bigger model size and higher computation density the model has, the more robust it will be. But the training time may not follow this rule because of the computation parallelism. These results could guide us to the tradeoff between robustness, training time, and hardware cost, which can facilitate the hardware designs in the future.
In this work, we observe that single-strength adversarial training demonstrates limited working range to enhance the model’s resilience to adversarial attacks. Hence, we propose two multi-strength adversarial training (MAT) methods, namely, mixed MAT and parallel MAT, to alleviate adversarial attacks. Moreover, a random walk algorithm is adopted to optimize the selections of the adversarial strengths that are included in the two MAT methods. Our experimental results on four different datasets show that compared to the single-strength adversarial training method, both mixed MAT and parallel MAT substantially improve DNN model’s resilience to adversarial attacks. The results also indicate that higher robustness can be achieved by higher computation density and bigger model size, but the training time can be greatly reduced by using computation parallelism on a FPGA platform.
-  Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
-  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. Proceedings of the International Conference on Learning Representations (ICLR), 2015.
-  Nicolas Papernot, Patrick McDaniel, and Ian J Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
-  Matej Uličnỳ, Jens Lundström, and Stefan Byttner. Robustness of deep convolutional neural networks for image recognition. In International Symposium on Intelligent Computing Systems, pages 16–30. Springer, 2016.
-  Nicolas Papernot, Patrick McDaniel, Ian J Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
-  Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Adversarial machine learning at scale. Proceedings of the International Conference on Learning Representations (ICLR), 2017.
-  Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39–57. IEEE, 2017.
-  Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In International Conference on Machine Learning, pages 854–863, 2017.
-  Jonathan Harel, Christof Koch, and Pietro Perona. Graph-based visual saliency. Advances in Neural Information Processing Systems, pages 545–552, 2007.
-  Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.