Code-Bridged Classifier (CBC): A Low or Negative Overhead Defense for Making a CNN Classifier Robust Against Adversarial Attacks

01/16/2020 ∙ by Farnaz Behnia, et al. ∙ University of California-Davis George Mason University University of Maryland, Baltimore County 12

In this paper, we propose Code-Bridged Classifier (CBC), a framework for making a Convolutional Neural Network (CNNs) robust against adversarial attacks without increasing or even by decreasing the overall models' computational complexity. More specifically, we propose a stacked encoder-convolutional model, in which the input image is first encoded by the encoder module of a denoising auto-encoder, and then the resulting latent representation (without being decoded) is fed to a reduced complexity CNN for image classification. We illustrate that this network not only is more robust to adversarial examples but also has a significantly lower computational complexity when compared to the prior art defenses.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep learning is the foundation for many of today’s applications, such as computer vision, natural language processing, and speech recognition. After AlexNet [8] made a breakthrough in 2012 by significantly outperforming other object detection solutions, and winning the ISLVRC competition [26], CNNs gained a well-deserved popularity for computer vision applications. This energized the research community to architect models capable of achieving higher accuracy (that led to development of many higher accuracy models including GoogleNet [29] and ResNet [6]), increased the demand and research for hardware platforms capable of fast execution of these models [13, 14], and created a demand for lower complexity models [18, 17, 19] capable of reaching high levels of accuracy.

Even though the evolution in their model structure and the improvement in their accuracy have been very promising in recent years, it is illustrated that convolutional neural networks are prone to adversarial attacks through simple perturbation of their input images [5, 9, 3, 15]. The algorithms proposed by [5, 9, 3, 15]

have demonstrated how easily the normal images can be perturbed with adding a small noise in order to fool neural networks. The main idea is to add a noise vector containing small values to the original image in the opposite or same direction of the gradient calculated by the target network to produce adversarial samples

[5, 9].

The wide-spread adoption of CNNs in various applications and their unresolved vulnerability to adversarial samples has raised many safety and security concerns and has motivated a new wave of deep learning research. To defend against adversarial attacks, the concept of adversarial training was proposed in [5] and was further refined and explored in [9, 3]

. Adversarial training is a data augmentation technique in which by generating a large number of adversarial samples and including them with correct labels in the training set, the robustness of network against adversarial attacks improves. Training an adversarial classifier to determine if the input is normal or adversarial and using autoencoder (AE) to remove the input image noise before classification are some of the other approaches taken by

[5] and [28]. Finally, [22]

utilizes distillation as a defense method against adversarial attacks in which a network with a similar size to the original network is trained in a way that it hides the gradients between the softmax layer and its predecessor.

In this work, we combine denoising and classification into a single solution and propose the code-bridged classifier (CBC). We illustrate that CBC is 1) more robust against adversarial attacks compared to a similar CNN solution that is protected by a denoising AE, and has substantially less computational complexity compared to such models.

Ii Background and Related Work

The vulnerability of deep neural networks to adversarial examples was first investigated in [28]. Since this early work, many new algorithms for generating adversarial examples, and a verity of solutions for defending against these attacks are proposed. Following is a summary of the attack and defense models related to our proposed solution:

Ii-a Attack Models

Many effective attacks have been introduced in the literature. Some of the most notable attacks include Fast Gradient Sign Method (fgsm) [5], Basic Iterative Method [9], Momentum Iterative Method [3], DeepFool [15] and Carlini Wagner [1], the description of each method are as follows.

Ii-A1 FGSM attack

in [5], a simple method is suggested to add a small perturbation to the input to make an adversarial image. The adversarial image is obtained by:

(1)

in which is the input image, is the correct label, is the network parameters, and

is the loss function.

defines the magnitude of the noise. The larger the , the larger the possibility of misclassification. Figure 1 illustrates how such adversarial perturbation can change the classifier’s prediction.

Fig. 1: The FGSM attack is used to add adversarial noise to the original image. The adversarial perturbation remains imperceptible to the human eyes but causes the neural network to misclassify the input image.

Ii-A2 Basic Iterative Method (BIM) attack [9]

Also known as Iterative-FGSM attack, BIM attack is iterating over the FGSM attack, increasing the effectiveness of the attack. The BIM attack can be expressed as:

(2)

Ii-A3 Momentum Iterative attack [3]

In the Momentum Iterative attack, the momentum is also considered when calculating the adversary perturbation, and is expressed as:

(3)

in which is the momentum, and is the norm of the gradient.

Ii-A4 Deepfool [15]

The Deepfool attack is formulated such that it can find adversarial examples that are more similar to the original ones. It assumes that neural networks are completely linear and classes are distinctively separated by hyper-planes. With these assumptions, it suggests an optimal solution to find adversarial examples. However, because neural networks are nonlinear, the step for finding the solution is repeated. We refer to [15] for details of the algorithm.

Ii-A5 Carlini & Wagner (CW) [1]

Finding adversarial examples in the CW attack is an iterative process that is conducted against multiple defense strategies. The CW attack uses Adam optimizer and a specific loss function to find adversarial examples that are less distorted than other attacks. For this reason, the CW attack is much slower. Adversarial examples can be generated by employing , and norms. The objective function in CW attack consider an auxiliary variable and is defined as:

(4)

Then if we consider the norm, this perturbation is optimized with respect to :

(5)

in which function is defined as follows:

(6)

in the above equation, is the pre-softmax output for class , the parameter represents the target class, and is the parameter for controlling the confidence of misclassification.

Ii-B Transferability of Adversarial Examples

All previously described attacks are carried out in a white-box setting in which the attacker knows the architecture, hyperparameters, and trained weights of the target classifier as well as the existing defense mechanism (if any). It is very hard to defend against white-box attacks because the attacker can always use the information she has to produce new and working adversarial inputs. However, adversarial attacks can be considered in two other settings: Gray Box and Black Box attacks. In gray box attacks, the attacker knows the architecture but doesn’t have access to the parameters, the defense mechanism. In black-box setting the attacker does not know the architecture, the parameters, and the defense method.

Unfortunately, it has been shown that adversarial examples generalize well across different models. In [28] it was shown that many of the adversarial examples that are generated for (and are misclassified by) the original network are also misclassified on a different network that is trained from scratch with different hyperparameters or using disjoint training sets.

The findings of [28] are confirmed by the following works, as in [16], universal perturbations are successfully found that not only generalize across images but also generalize across deep neural networks. These perturbations can be added to all images and the generated adversarial example is transferable across different models. The work in [23, 21] show that adversarial examples that can cause a model to misclassify, can have the same influence on another model that is trained for the same task. Therefore, an attacker can train her dummy model to generate the same output, craft/generate adversarial images on her model, and rely on the transferability of the adversarial examples, being confident that there is a high chance for the target classifier to be fooled. We argue that our proposed solution can effectively defend black-box attacks.

Ii-C Defenses

Several works have investigated defense mechanisms against adversarial attacks. In [5], adversarial training is proposed to enhance the robustness of the model. In [2, 12] autoencoders are employed to remove the adversarial perturbation and reconstruct a clean input. In [22] distillation is used to hide the gradients of the network from the attacker. Other approaches are also used as a defense mechanism [25, 20, 31]. In this section, we explore the ideas for defending against adversarial examples.

Ii-C1 Adversarial Training

The basic idea of the adversarial training [5] is to train a robust classifier via adding many adversarial examples (that are generated using different attacks) to the training dataset [3, 15, 11]. The problem with this approach is that it can only make the network robust against known (and trained for) attacks for generating adversarial examples. It also increases the training time significantly.

Ii-C2 Defensive Distillation

In [22]

distilling was originally proposed to train a smaller student model from a larger teacher model with the objective that the smaller network predicts the probability of the bigger network. The distillation technique takes advantage of the fact that a probability vector contains more information than only class labels, hence, it is a more effective mean for training a smaller network. For defensive distillation, the second network is the same size as the first network

[22]. The main idea is to hide the gradients between the pre-softmax and softmax layers to make the attacker’s job more difficult. However, it was illustrated in [1] that this defense can be beaten by using the pre-softmax layer outputs in the attack algorithm and/or choosing a different loss function.

Ii-C3 Gradient Regularization

Input gradient regularization was fist introduced by [4]

to improve the generalization of training in neural networks by a double backpropagation method.

[22] mentions the double backpropagation as a defense and [25] evaluate the effectiveness of this idea to train a more robust neural network. This approach intends to make sure that if there is a small change input, the change in KL divergence between the predictions and the labels also will be small. However, this approach is sub-optimal because of the blindness of the gradient regulation.

Ii-C4 Adversarial Detection

Another approach taken to make neural networks more robust is to detect adversarial examples before feeding to the network[20, 10]. [20] tries to find a decision boundary to separate adversarial and clean inputs. [10] deploys the fact that the perturbation of pixel values by adversarial attack alters the dependence between pixels. By modeling the differences between adjacent pixels in natural images, deviations due to adversarial attacks can be detected.

Ii-C5 Autoencoders

[2]

analyzes the use of normal and denoising autoencoders as a defense method. Autoencoders are neural networks that code the input and then try to reconstruct the original image as their output.

[12], as illustrated in Fig. 2, uses a two-level module and uses autoencoders to detect and reform adversarial images before feeding to the target classifier. However, this method may change the clean images and also add a computational overhead to the whole defense-classifier module. To improve the method introduced in [12], [27] presents an efficient auto-encoder with a new loss function which is learned to preserve the local neighborhood structure on the data manifold.

Fig. 2: Magnet defense in [12] is a two stage defense: the first stage tries to detect the adversarial examples. The images that pass the first stage are denoised using an AutoEncoder in the second stage and fed to the classifier.

Iii Problem statement

An abstract view of a typical Auto-Encoder (AE) and Denoising Auto-Encoder (DAE) is depicted in Fig. 3. An AE is comprised of two main components: 1) The encoder, , that extracts the corresponding latent space for input , and 2) the decoder, , that reconstructs a representation of the input image from its compressed latest space representation. Ideally, the decoder can generate the exact inputs sample from the latent space, and the relation between the input and output of an AE can be expressed as . However, in reality, the output of an AE is to some extent different from the input. This difference is known as reconstruction error and is defined as [30]. When training an AE, the objective is .

A DAE is similar to AE, however, it is trained using a different training process. As illustrated in Fig. 3.b the input space of DAE are the noisy input samples, , and their corresponding latent space is generated by . Unlike AE (in which the is defined as the difference between the input and output of AE), the of DAE is defined as [30]. In other words, the reconstruction error is the difference between the output of decoder and the clean input samples. An ideal DAE removes the noise from the noisy input and generates the clean sample .

This refining property of DAEs, make them an appealing defense mechanism against adversarial examples. More precisely, by placing one or more DAEs at the input of a classifier, the added adversarial perturbations are removed and a refined input is fed into the subsequent classifier. The effectiveness of this approach highly depends on the extent of which the underlying DAE is close to an ideal DAE (in which the DAE completely refines the perturbed input). Although a well-trained DAE refines the perturbed input to some extent, it also imposes a reconstruction noises to it. As an example, assume that in Fig. 3.b is zero. This means the input is a clean image. In this case the output is . If the size of is large enough, it can move the input over the classifier’s decision boundary. This, as illustrated in Fig. 4, will result in predicting the input as a class member. In this scenario, DAE not only fails to defend against adversarial examples, but also generates noise that could lead to the misclassification of the clean input images.

The other problem of using AE or DAE as a pre-processing unit to refine the image and combat adversarial attacks is their added computational complexity. Adding an autoencoder as a pre-processor to a CNN increases 1) the energy consumed per classification, 2) the latency of each classification and 3the number of parameters of the overall model.

In the following section, we propose a novel solution for protecting the model against adversarial attacks that addresses both the computational complexity problem and the reconstruction error issue of using an AE as a pre-processor.

Fig. 3: An abstract view of a) a typical Auto-Encoder, and b) a Denoising Autoencoders. Two major components of both structures are 1) Encoder, , which extracts the latent space of sample inputs 2) Decoder, , which reconstructs sample inputs from the latent space.
Fig. 4: Reconstruction error (

) of a decoder can also result in mis-classification if the features extracted for the reconstructed image

are pushed outside of the classifier’s learnt decision boundary.

Iv Proposed Method

Using DAEs to refine perturbed input samples before feeding them into the classifier is a typical defense mechanism against adversarial examples [2, 12]. A general view of such defense is illustrated in Fig. 5.(top). In this figure, , are the decoder and encoder of DAE, respectively, represents the first few CONV layers of the CNN classifier, and , represents the later CONV stages. In this defense, the DAE and CNN are separately trained. The DAE is trained to minimize the reconstruction error, while the CNN is trained to reduce the pre-determined loss function (e.g. or loss). An improved version of such defense is when the training is done serially, where in the first stage, the DAE is trained, and then the CNN classifier is trained using the output of DAE as input sample. Note that the 2nd solution tends to get a higher classification accuracy. Regardless of the choice of training addition of a DAE to the CNN classifier adds to its complexity. Aside from added computational complexity, the problem with this defense mechanism is that AEs could act as a double agent: on one hand refining the adversarial examples is an effective means to remove the adversarial perturbation (noise) from the input image and is a valid defense mechanism, but on the other hand, its reconstruction error, , could force misclassification of clean input images. For correcting the behavior of the DAE, we propose the concept of Code Bridge Classifiers (CBC), aiming to 1) eliminating the impact of reconstruction error of the underlying DAE, and 2) reducing the computational complexity of the combined DAE and classifier to widen its applicability.

Fig. 5: (Top) the defense proposed in [12] where a DAE filter the noise in the input image before feeding it to a classifier. (Bottom) the CBC model in which the decoder of DAE and the first few conv layers of the base classifier are removed. Note that the decoder in CBC is only used for training the CBC, and is removed after training (for evaluation). In this figure , , are respectively the clean input sample, noisy input sample and the output of DAE. The is the corresponding ground truth, and and are reconstruction error and classification error respectively.

Fig. 5.(bottom), illustrates our proposed solution where the encoder of a trained DAE and a part of original CNN () are combined to form a hybrid yet compressed model. In this model, the decoder of DAE, and the first few CONV layers of the CNN model, , are eliminated. In CBC and are eliminated with the intuition that they act as an Auto Decoder (AD). As opposed to AE, the AD translates the latest space to an image and back to another latent space (intermediate representation of the image in the CNN captured by output channels of ). This is, however, problematic because 1) the decoder is not ideal and it introduces reconstruction error to the refined image, 2) decoding and encoding (the first few CONV layers act as an encoder) of the image only translates the image from one latest space to another without adding any information to it. This is when such code translation (latest space to latent space) could be eliminated and the code at the output of could be directly used for classification. This allows us to eliminate the useless AD (the decoder and first few conv layers of original CNN that act as an encoder) and not only reduce the computational complexity the overall model but also improves the accuracy of the model by eliminating the noise related to image reconstruction of the decoder .

The training process for the CBC is serial: We first train a DAE, the encoder section of the model is separated. Then the trained decoder is paired with a smaller CNN compare to that of the original model. One way to build a smaller model is to remove the first few CONV layers of the original model and adjust the width of the DAE and the partial CNN to match the filter sizes. The rule of thumb for the elimination of the layers is to remove as many CONV layers equal to those in the encoder of AE. The next step is to train the partial CNN while fixing the values of the decoder, allowing the propagation to only alter the weights in the classifier .

V Implementation Details

In this section, we investigate the effectiveness of our proposed solution against adversarial examples prepared for FashionMNIST [32] and CIFAR-10 [7] datasets. To be able to compare our work with previous work in [12], we build our CBC solution on top of the CNN models that are described as Base in tables I and II. In these tables, the DAE columns represent the solution proposed in [12], in which a full auto-encoder pre-process the input to the CNN model and finally the columns CBC described the modified model corresponding to our proposed solution. The DAE as described in tables I and II includes 2 convolutional layers for encoding, and 2 convolutional transpose layers for decoding. The input is a clean image, and the output is an image of the same size generated by the autoencoder.

To build the CBC classifier, we stacked the trained encoder of the DAE with an altered version of the target classifier in which some of the CONV layers are removed. The trade-off on the number of layers to be removed is discussed in the next section. Considering that the encoder quickly reduces the size of the input image to a compressed latent space representation, the CNN following the latent space is not wide. For this reason, we also remove the max-pooling layers making sure that the number of parameters of the CBC classifier, when it reaches the softmax layer is equal to that of the base architecture. In our implementation, all the attacks and models are implemented using PyTorch

[24] framework. To train the models we only use clean samples, and freeze the weights of the encoder part and train the remaining layers. Training parameters of the target classifier and the proposed architecture are listed in table III. We evaluated our proposed solutions against the FGSM [5], Iterative [9], DeepFool [15], and Carlini Wagner [1] adversarial attacks.

Base DAE-CNN CBC
Type Size Type Size Type Size

Defense

Conv.ReLU

Conv.ReLU
Conv.ReLU Conv.ReLU
ConvTran.ReLU
ConvTran.ReLU

CNN

Conv.ReLU Conv.ReLU Conv.ReLU
Conv.ReLU Conv.ReLU Conv.ReLU
Max Pool Max Pool FC.ReLU
Conv.ReLU Conv.ReLU FC.ReLU
Conv.ReLU Conv.ReLU Softmax 10
FC.ReLU FC.ReLU
FC.ReLU FC.ReLU
Softmax 10 Softmax 10
TABLE I: Architecture of the FashionMNIST Classifiers
Base DAE-CNN CBC
Type Size Type Size Type Size

Defense

Conv.ReLU Conv.ReLU
Conv.ReLU Conv.ReLU
ConvTran.ReLU
ConvTran.ReLU

CNN

Conv.ReLU Conv.ReLU Conv.ReLU
Conv.ReLU Conv.ReLU Conv.ReLU
Conv.ReLU Conv.ReLU Conv.ReLU
Max Pool Max Pool Conv.ReLU
Conv.ReLU Conv.ReLU Conv.ReLU
Conv.ReLU Conv.ReLU Conv.ReLU
Conv.ReLU Conv.ReLU Conv.ReLU
Max Pool Max Pool Conv.ReLU
Conv.ReLU Conv.ReLU Conv.ReLU
Conv.ReLU Conv.ReLU Avg Pool
Conv.ReLU Conv.ReLU Softmax 10
Avg Pool Avg Pool
Softmax 10 Softmax 10
TABLE II: Architecture of the CIFAR-10 Classifiers
Dataset Optimization Method Learning Rate Batch Size Epochs
FashionMNIST Adam 0.001 128 50
CIFAR-10 Adam 0.0001 128 150
TABLE III: Training Parameters

Vi Experimental Results

By adopting the training flow described in Table IV, the top-1 accuracy of the base classifiers (in Tables I and II) that we trained for FashionMNIST and CIFAR10 are 95.1% 90.% respectively. For the evaluation purpose, we trained denoising autoencoders with different noise values for both Datasets. The structure of the DAEs are shown in Table I and II. The reconstruction error for DAEs was arround 0.24 and 0.54 for FashionMNIST and CIFAR10 datasets respectively.

Vi-a Selecting altered CNN architecture:

As discussed previously, the removal of the decoder of the DAE should be paired with removing the first few CONV layers from the base CNN and training the proceeding CONV layers to use the code (latent space) that is generated by the encoder as input. The number of layers to be removed was determined by a sweeping experiment in which the accuracy of the resulting model and its robustness against various attacks was assessed. Figure 6 shows the accuracy of CBC networks when the number of convolutional layers in the altered classifier is reduced compared to the base classifier. The experiment is repeated for both MNIST and CIFAR datasets, and the robustness of each model against CW [1], Deepfool [15], and FGSM [5] with is assessed. As illustrated in Fig. 6, the models remain insensitive to the removal of some of the first few layers (2 in MNIST, and 5 in CIFAR) with negligible (1̃%) change in the accuracy by complete removal of each CONV layer until they reach a tipping point. The MNIST model, for being a smaller model, reaches that tipping point when 2 CONV layers are removed, whereas the CIFAR model (for being a larger model) is slightly impacted even after 5 CONV layers are removed.

Fig. 6: The change in the accuracy of the CBC model for a) FashionMNIST and b) CIFAR-10 classification with respect to the number of removed CONV layers from the base CNN model.

Vi-B CBC accuracy and comparison with prior art

Fig. 7 captures the result of our simulation, in which the robustness and the accuracy of the base CNN, the solution in [12] in which the DAE refines the input for base CNN model, and our proposed CBC are compared. For the CNN protected with DAE, we provide two sets of results: 1) DAE-CNN Model accuracy: DAE and CNN are separately trained and paired together; 2) Retrained-DAC-CNN model accuracy: The CNN is incrementally trained using the refined images produced at the output of the DAE (denoted by Retraind-DAE-CNN). The comparison is done for the classification of both original and adversarial images. Results for FGSM, DeepFool, and CW adversarial attacks are reported. For completeness, we have captured the robustness of each solution when the DAE is trained with different noise values.

Fig. 7: Comparing the accuracy of the base CNN model, the DAE protected CNN model (with and without retraining), and the CBC model when classifying bening images and adverserial images generated by different attack models: (left): FashionMNIST models, (right): CIFAR-10 models.

As illustrated in Fig. 7, the base model is very sensitive to adversarial examples, and the accuracy of the network in presence of adversarial examples (depending on the attack type) drops from over 90% to the range of 0% to 20%. The DAE-CNN model also performs very poorly even for benign images. This is because of the reconstruction error introduced by the decoder which severely affects the accuracy and the ability of the base CNN model. The Retrained-DAE-CNN model (representing the solution in [12]) performs well in classifying benign images and also exhibits robustness against adversarial images. As illustrated, the robustness improves when it is paired with a DAE that is trained with high noise. The best solution, however, is the CBC solution: regardless of the type of the attack, type of the benchmark, and the noise of DAE, the CBC model outperforms other solutions in both classification accuracy of the benign images and also robustness against adversarial examples. This clearly illustrates that the CBC model by eliminating the reconstruction error is a far more robust solution than DAE protected CNN models.

Vi-C Reduction in model size and computational complexity

In a CBC model, the DAE’s decoder and the first few CONV layers of the base CNN model are removed. Hence, a CBC model has a significantly smaller flop count (computational complexity). Table IV captures the number of model Parameters and the Flop count for each of the CBC classifiers which are described in Tables I and II. Note that the majority of computation in a CNN model is related to its CONV layers, while a CONV layer has a small number of parameters. Hence, removing a few CONV layers may result in a small reduction in the number of parameters, but the reduction in the FLOP count of the CBC models is quite significant. As reported in Table IV, in the FashionMNIST model, the flop count has reduced by 1.8x and 2.8X compared to the base and DAE protected model, while the parameter count is respectively reduced by 0.37% and 2.69% . This saving is more significant for the CIFAR-10 CBC model, where its computational complexity has reduced 3.1x and 3.3x compared to the Base and DAE protected model respectively, while the number of parameters is respectively reduced by 5.8% and 13.4%. Reduction in the flop count of the CBC model, as illustrated in table IV also reduces the model’s execution time. The execution time reported in table IV is the execution time of each model over the validation set of each (FashionMNIST and CIFAR-10) dataset when the model is executed using Dell PowerEdge R720 with Intel Xeon E5-2670 (16 core CPUs) processors. As reported in table IV, the execution time of the CBC is even less than the base CNN. Note that the CBC also results in processing unit energy reduction proportional to the reduction in the flop count. Hence, the CBC, not only resist against adversarial attacks, but (for being significantly smaller than the base model) also reduces the execution time, and energy consumed for classification.

Dataset Model Flops Parameters Execution time
FashionMNIST Base CNN 9.08 MMac 926.6 K 463.4 s
AE-CNN[12] 14.3 MMac 951.81 K 562.3 s
CBC 5.04 MMac 926.25 K 293.7s
CIFAR-10 Base CNN 0.59 GMac 1.37 M 1673.0 s
AE-CNN [12] 0.63 GMac 1.49 M 1749.7 s
CBC 0.19 GMac 1.29 M 1191.6 s
TABLE IV: Comparison of the number of parameters, computational complexity and execution time of CBC and the base model with AE and without AE protection.

Vii Conclusion

In this paper, we propose the Code-Bridged Classifier (CBC) as a novel and extremely efficient mean of defense against adversarial learning attacks. The resiliency and complexity reduction of CBC is the result of directly using the code generated by the encoder of a DAE for classification. For this purpose, at the training phase, a decoder is instantiated in parallel with the model to tune the denoising encoder by computing and back-propagating the image reconstruction error. At the same time, the code is used for classification using a lightweight classifier. Hence, the encoder is trained for both feature extraction (contributing to the depth of classifier and low-level feature extraction) and denoising. The parallel decoder is then removed when the model is fully trained. This allows the CBC to achieve high accuracy by avoiding the reconstruction error of the DAE’s decoder, while reducing the computational complexity of the overall model by eliminating the decoder and few CONV layers from the trained model.

References

  • [1] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), Cited by: §II-A5, §II-A, §II-C2, §V, §VI-A.
  • [2] I. Chen and B. Sirkeci-Mergen (2018) A comparative study of autoencoders against adversarial attacks. In

    Proceedings of the Int. Conf. on Image Processing, Computer Vision, and Pattern Recognition (IPCV)

    ,
    Cited by: §II-C5, §II-C, §IV.
  • [3] Y. Dong et al. (2018) Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, Cited by: §I, §I, §II-A3, §II-A, §II-C1.
  • [4] H. Drucker and Y. Le Cun (1992) Improving generalization performance using double backpropagation. IEEE Trans. on Neural Networks 3 (6). Cited by: §II-C3.
  • [5] I. J. Goodfellow et al. (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §I, §I, §II-A1, §II-A, §II-C1, §II-C, §V, §VI-A.
  • [6] K. He et al. (2016) Deep residual learning for image recognition. In proc. of the IEEE conf. on computer vision and pattern recognition, pp. 770–778. Cited by: §I.
  • [7] A. Krizhevsky et al. () CIFAR-10 (canadian institute for advanced research). . Cited by: §V.
  • [8] A. Krizhevsky et al. (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, Cited by: §I.
  • [9] A. Kurakin et al. (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §I, §I, §II-A2, §II-A, §V.
  • [10] J. Liu et al. (2019) Detection based defense against adversarial examples from the steganalysis point of view. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4825–4834. Cited by: §II-C4.
  • [11] A. Madry et al. (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §II-C1.
  • [12] D. Meng and H. Chen (2017) Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conf. on Computer and Communications Security, pp. 135–147. Cited by: Fig. 2, §II-C5, §II-C, Fig. 5, §IV, §V, §VI-B, §VI-B, TABLE IV.
  • [13] A. Mirzaeian et al. (2020-01) TCD-npe: a re-configurable and efficient neural processing engine, powered by novel temporal-carry-deferring macs. In 2020 International Conference on ReConFigurable Computing and FPGAs (ReConFig), Cited by: §I.
  • [14] A. Mirzaeian et al. (2020) Nesta: hamming weight compression-based neural proc. engine. In Proceedings of the 25th Asia and South Pacific Design Automation Conference, Cited by: §I.
  • [15] S. Moosavi-Dezfooli et al. (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conf. on computer vision and pattern recognition, pp. 2574–2582. Cited by: §I, §II-A4, §II-A4, §II-A, §II-C1, §V, §VI-A.
  • [16] S. Moosavi-Dezfooli et al. (2017) Universal adversarial perturbations. In Proceedings of the IEEE Conf. on computer vision and pattern recognition, Cited by: §II-B.
  • [17] K. Neshatpour et al. (2019-12) ICNN: the iterative convolutional neural network. ACM Trans. Embed. Comput. Syst. 18 (6). External Links: ISSN 1539-9087 Cited by: §I.
  • [18] K. Neshatpour et al. (2018) ICNN: An iterative implementation of convolutional neural networks to enable energy and computational complexity aware dynamic approximation. In 2018 Design, Automation & Test in Europe Conf. & Exhibition (DATE), pp. 551–556. Cited by: §I.
  • [19] K. Neshatpour et al. (2019) Exploiting energy-accuracy trade-off through contextual awareness in multi-stage convolutional neural networks. In 20th International Symposium on Quality Electronic Design (ISQED), pp. 265–270. Cited by: §I.
  • [20] T. Pang et al. (2018) Towards robust detection of adversarial examples. In Advances in Neural Information Processing Systems, pp. 4579–4589. Cited by: §II-C4, §II-C.
  • [21] N. Papernot, P. McDaniel, et al. (2017)

    Practical black-box attacks against machine learning

    .
    In Proceedings of the 2017 ACM on Asia Conf. on computer and communications security, pp. 506–519. Cited by: §II-B.
  • [22] N. Papernot et al. (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §I, §II-C2, §II-C3, §II-C.
  • [23] N. Papernot et al. (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277. Cited by: §II-B.
  • [24] A. Paszke et al. (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Cited by: §V.
  • [25] A. S. Ross and F. Doshi-Velez (2018) Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In

    Thirty-second AAAI Conf. on artificial intelligence

    ,
    Cited by: §II-C3, §II-C.
  • [26] O. Russakovsky et al. (2015) Imagenet large scale visual recognition challenge. Int. journal of computer vision 115 (3), pp. 211–252. Cited by: §I.
  • [27] M. Sabokrou et al. (2019) Self-supervised representation learning via neighborhood-relational encoding. In ICCV, pp. 8010–8019. Cited by: §II-C5.
  • [28] C. Szegedy et al. (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §I, §II-B, §II-B, §II.
  • [29] C. Szegedy et al. (2015) Going deeper with convolutions. In proc. of the IEEE conf. on computer vision and pattern recognition, pp. 1–9. Cited by: §I.
  • [30] P. Vincent et al. (2008) Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th Int. Conf. on Machine Learning, ICML ’08, pp. 1096–1103. External Links: ISBN 978-1-60558-205-4 Cited by: §III, §III.
  • [31] X. Wang et al. (2019) Protecting neural networks with hierarchical random switching: towards better robustness-accuracy trade-off for stochastic defenses. In Proceedings of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI’19), Cited by: §II-C.
  • [32] H. Xiao et al. (2017-08-28)(Website) External Links: cs.LG/1708.07747 Cited by: §V.