Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks

05/28/2019 ∙ by Pu Zhao, et al. ∙ Northeastern University 0

Despite the great achievements of deep neural networks (DNNs), the vulnerability of state-of-the-art DNNs raises security concerns of DNNs in many application domains requiring high reliability.We propose the fault sneaking attack on DNNs, where the adversary aims to misclassify certain input images into any target labels by modifying the DNN parameters. We apply ADMM (alternating direction method of multipliers) for solving the optimization problem of the fault sneaking attack with two constraints: 1) the classification of the other images should be unchanged and 2) the parameter modifications should be minimized. Specifically, the first constraint requires us not only to inject designated faults (misclassifications), but also to hide the faults for stealthy or sneaking considerations by maintaining model accuracy. The second constraint requires us to minimize the parameter modifications (using L0 norm to measure the number of modifications and L2 norm to measure the magnitude of modifications). Comprehensive experimental evaluation demonstrates that the proposed framework can inject multiple sneaking faults without losing the overall test accuracy performance.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Modern technologies based on pattern recognition, machine learning, and specifically deep learning, have achieved significant breakthroughs

(lecun2015deep, )

in a variety of application domains. Deep neural network (DNN) has become a fundamental element and a core enabler in the ubiquitous artificial intelligence techniques. However, despite the impressive performance, many recent studies demonstrate that state-of-the-art DNNs are vulnerable to adversarial attacks

(goodfellow2015explaining, ; szegedy2013intriguing, )

. This raises concerns of the DNN robustness in many applications with high reliability and dependability requirements such as face recognition, autonomous driving, and malware detection

(mahmood2017adversarial, ; evtimov2017robust, ).

After the exploration of adversarial attacks in image classification and objection detection from 2014, the vulnerability and robustness of DNNs have attracted ever-increasing attentions and efforts in the research field known as adversarial machine learning. Since then, a large amount of efforts have been devoted to: 1) design of adversarial attacks against machine learning tasks (carlini2017towards, ; chen2017ead, ; zhao2018admm, )

; 2) security evaluation methodologies to systematically estimate the DNN robustness

(biggio2014security, ; zhang2018efficient, ); and 3) defense mechanisms under the attacks (rota2017randomized, ; demontis2018yes, ; madry2017towards, ). This paper falls into the first category.

The adversarial attacks can be classified into: 1) evasion attacks

(carlini2017towards, ; chen2017ead, ; zhao2018admm, ) that perturb input images at test time to fool DNN classifications; 2) poisoning attacks (xiao2015feature, ; biggio2012poisoning, ) that manipulate training data sets to obtain illy-trained DNN models; and 3) fault injection attacks (liu2017fault, ; breier2018practical, ) that change classifications of certain input images to the target labels by modifying DNN parameters. The general purpose of an adversarial attack no matter its category is to have misclassifications of certain images, while maintaining high model accuracy for the other images. This work proposes the fault sneaking attack, a new method of the fault injection attack.

Fault injection attack perturbs the DNN parameter space. As DNNs are usually implemented and deployed on various hardware platforms including CPUs/GPUs and dedicated accelerators, it is possible to perturb the DNN parameters stored in memory enabled by the development of memory fault injection techniques such as laser beam (selmke2015precise, ) and row hammer (kim2014flipping, ). To be practical, we propose the fault sneaking attack to perturb the DNN parameters with considerations of attack implementation in the hardware.

It is a more challenging task to perturb the parameters (as fault injection attack) than to perturb the input images (as evasion attack) due to the following two reasons: 1) global effect: perturbing one input would not influence the classifications of other unperturbed inputs while perturbing the parameters has a global effect for all inputs; 2) numerous parameters: the DNNs usually have a much greater number of parameters than the pixel number of an input image. The fault injection attack should be stealthy in that misclassifications are only for certain images while maintaining high model accuracy for the other images, and therefore cannot be easily detected. And it should also be efficient in that the parameter modifications should be as small as possible, and therefore can be implemented easily in the hardware. This work tackles these challenges by proposing the fault sneaking attack based on ADMM (alternating direction method of multipliers).

The theoretical contributions of this work are:

+ Stealthy injection of multiple faults: The proposed fault sneaking attack based on ADMM enables to achieve multiple designated faults (misclassifications) with the flexibility to specify any target labels and the stealthiness to hide the faults. The fault injection attack (liu2017fault, ) can only inject one fault.

+ A systematic application of ADMM with analytical solutions:

Comparing with the heuristic

(liu2017fault, ), the proposed fault sneaking attack is an optimization based framework leveraging ADMM with analytical solutions. Comparing with evasion attacks (zhao2018admm, ; carlini2017adversarial, ), the proposed fault sneaking attack deals with a more challenging problem with higher dimensionality, but surprisingly finds much less expensive analytical solutions.

+ A general ADMM framework for both and norm minimizations: The proposed ADMM based framework for solving the optimization problem of fault sneaking attack can adopt both norm (the number of parameter modifications) and norm (the magnitude of modifications) to measure the difference between original and modified DNN models with only minor changes. However, (liu2017fault, ) cannot deal with the non-differential norm.

The experimental contributions of this work are:

+ Less model accuracy loss: Under the same experimental settings and misclassification requirements, the proposed fault sneaking attack degrades the DNN model accuracy by only 0.8 percent for MNIST and 1.0 percent for CIFAR, while (liu2017fault, ) degrades the DNN model accuracy by 3.86 percent and 2.35 percent, respectively.

+ Comprehensive analysis of DNN fault tolerance: We extensively test the capability of DNNs on tolerance of fault injection attacks. We find that there is an upper limit on the number of images with successful misclassifications depending on the DNN model itself. For the DNN models used in this work, the number is around 10 demonstrating the tolerance for sneaking faults as 10.

2. Related Work

The adversarial attacks are reviewed from the aspects of perturbing the inputs and perturbing the DNN parameters.

2.1. Perturbing the Input Space

Evasion attacks generate adversarial examples to fool DNNs by perturbing the legitimate inputs. Basically, an adversarial example is produced by adding human-imperceptible distortions onto a legitimate image, such that the adversarial example will be classified by the DNN as a target (wrong) label. The norm-ball constrained evasion attacks have been well studied, including the FGM (goodfellow2014explaining, ) and IFGSM (KurakinGB2016adversarial, ) attacks with norm restriction, the L-BFGS (szegedy2013intriguing, ) and C&W (carlini2017towards, ) attacks minimizing the distortion, and the JSMA (papernot2016limitations, ) and ADMM (pu2018reinforced, ) attacks trying to perturb the minimum number of pixels, namely, minimizing the distortion.

Many defense works have been proposed, including defensive distillation

(papernot2016distillation, ) , defensive dropout (wang2018defending, ; wang2018defensive, ) , and robust adversarial training. (madry2017towards, ) The robust adversarial training method ensures strong defense performance with high computation requirement.

2.2. Perturbing the Parameter Space

Poisoning attacks, which train DNNs by adding poisoned images into the training data sets, and fault injection attacks, which modify the DNN parameters directly, are attacks that perturb the DNN parameters. Poisoning attack (xiao2015feature, ) is computation-intensive as it requires iterative retraining and is not the focus of our paper. Fault injection attack (liu2017fault, ) was first proposed by Liu et al, which uses a heuristic approach to profile the sink class for single bias attack scheme, and compresses the modification by iteratively enforcing the smallest element as zero and feasibility check for gradient descent attack scheme. Different from (liu2017fault, ), the fault sneaking attack uses a systematic optimization-based approach, achieving flexible designations of target labels and portion of DNN parameters to modify, and enabling both the and (non-differential) norms in the objective function.

2.3. Practical Fault Injection Techniques

The common techniques flipping the logic values in memory include laser beam and row hammer. Laser beam (barenghi2012fault, ) can precisely change any single bit in SRAM by carefully tuning the laser beam such as diameter and energy level (selmke2015precise, ). Row hammer (kim2014flipping, ) can inject faults into DRAM by rapidly and repeatedly accessing a given physical memory location to flip corresponding bits (xiao2016one, ). Some works demonstrate the feasibility of using row hammer on mobile platforms (van2016drammer, ) and launching the row hammer to trigger the processor lockdown (jang2017sgx, ). However, fine-tuning the laser beam or locating the bits in memory can be time consuming (van2016drammer, ). Therefore, it is essential to minimize the number of modified parameters by our fault sneaking attack. Recently, (breier2018practical, ) implements the DNN fault injection attack (liu2017fault, ) physically on embedded systems using laser beam. In particular, (breier2018practical, )

injects faults into the widely used activation functions in DNNs and demonstrates the possibility to achieve misclassifications by injecting faults into the DNN hidden layer.

3. Problem Formulation

Threat Model: We consider an adversary tampering with the DNN classification results of certain input images into designated target labels by modifying the DNN model parameters. In this paper, we assume white-box attack, i.e., the adversary has the complete knowledge of the DNN model (including both structures and parameters) and low-level implementation details (how and where DNN parameters are located in the memory), as the highest and most stringent security standard to assess the robustness of DNN systems under fault sneaking attack. Given existing fault injection techniques can precisely flip any bit of the data in memory, we assume the adversary can modify any parameter in DNN to any value that is in the valid range of the used arithmetic format. Note that, we do not assume the adversary knows the training and testing data sets, which are usually not available to the system users.

The adversary has two constraints when launching the fault sneaking attack: (i) Stealthy, in that the classification results of the other images should be kept as unchanged as possible; (ii) Efficient, in that the modifications of DNN parameters in terms of number of modified parameters or magnitude of parameter modifications should be as small as possible. The first constraint is important because even if the attack is specified for certain input images, it is highly possible to change the classification results of the other images when modifying the DNN parameters, thereby resulting in obviously low DNN model accuracy and easy detection of the attack. The second constraint minimizing the parameter modifications can reduce the influence and difficulty of implementing the attack.

Attack Model: Given images with their correct labels , we would like to change the classification results of the first () images to their target labels , while the classifications of the rest images are unchanged, by modifying parameters in the DNN model. Note that the unchanged labels of the other images are to make the attack stealthy and hard to detect.

The original DNN model parameters are denoted as , and represents the parameter modifications. So the parameters after the modification are . Note that has the flexibility of specifying either all the DNN parameters or only a portion of the parameters, e.g., weight parameters of the specific layer(s). The fault sneaking attack can be formulated as an optimization problem:


where measures the DNN parameter modifications; and represents the misclassification requirements, i.e., with the modified DNN model parameters , the first images in set will be classified as target labels , while the classifications of the rest images are kept unchanged. The details of the and functions are to be explained in the following sections.

3.1. Measurements of Parameter Modifications

represents the measurement of the parameter modifications, which should be minimized for the attack implementation efficiency. In this paper, and norms are used as as follows,


The norm of measures the number of nonzero elements in and therefore measures the number of modified parameters by the attack. Minimizing norm can make it easier to implement the attack in DNN systems, considering that the difficulty of parameter modifications in real systems relates to the number of modified parameters (kim2014flipping, ). The norm of denotes the standard Euclidean distance between the modified and original parameters, and therefore measures the magnitude of parameter modifications. Minimizing norm can lead to minimal influence of the attack.

Minimizing the norm in the objective function is much harder than minimizing the norm, because the norm is non-differential. In this paper, the proposed ADMM framework enables both and norms in the objective function with only minor differences in the solution methods as specified in Sec. 4.

3.2. Misclassification Requirements

In (1), denotes the misclassification requirements: 1) the first images should be classified as the target labels instead of their correct labels, and 2) the classifications of the rest images should remain unchanged as their correct labels.

In the area of adversarial machine learning, the most effective objective function to specify that an input should be labeled as is the following function (carlini2017towards, ):


where denotes the

-th element of the logits, i.e., the input to the softmax layer. The softmax layer is the last layer in the DNN model, which takes logits as input and generates the final probability distribution outputs. The final outputs from the softmax layer are not utilized in the above

function, because the final outputs are usually dominated by the most significant class in a well trained model and thus less effective during computation. The DNN chooses the label with the largest logit, that is, . To enforce the input is classified as label , the logit of label , , must be larger than all of the other logits, . Thus, will achieve its minimal value if is classified as label .

From the above analysis, we propose the detailed form of as:


where stands for the targeted misclassifications of and denotes keeping classifications of unchanged. and are:


The ’s represent their relative importance to the measurement of modifications . represents the target label for the -th image in the images. achieves its minimum value, when the labels of the first images are changed to their target labels . Similarly, obtains its minimum value when the classifications of the rest images are kept unchanged.

4. General ADMM Solution Framework

We propose a solution framework based on ADMM to solve (1) for the fault sneaking attack. The framework is general in that it can deal with both and norms as . ADMM was first introduced in the mid-1970s with roots in the 1950s and becomes popular recently for large scale statistics and machine learning problems (boyd2011distributed, ). ADMM solves the problems in the form of a decomposition-alternating procedure, where the global problem is split into local subproblems first, and then the solutions to small local subproblems are coordinated to find a solution to the large global problem. It has been proved in (hong2017linear, ) that ADMM has at least the linear convergence rate, and it empirically converges in a few tens of iterations.

4.1. ADMM Reformulation

As ADMM requires multiple variables for reducing the objective function in alternating directions, we introduce a new auxiliary variable and (1) can now be reformulated as,


The augmented Lagrangian function of the above problem is:


Applying the scaled form of ADMM by defining , we obtain


4.2. ADMM Iterations

ADMM optimizes problem (9) in iterations. Specifically, in the -th iteration, the following steps are performed:


As demonstrated above, problem (9) is split into two subproblems, (10) and (11) through ADMM. In (10), the optimal solution is obtained by minimizing the augmented Lagrangian function with fixed and . Similarly, (11) finds the optimal to minimize with fixed and . In (12), we update with and . We can observe that ADMM updates the two arguments in an alternating fashion, where comes from the term alternating direction.

In the ADMM iterations, problems (10) and (11) are detailed as:


The solutions to the two problems are specified as follows.

4.3. z step

In this step, we mainly solve (13). The specific closed-form solution depends on the function ( or norm).

4.3.1. Solution for norm

If the function takes the norm, (13) has the following form:


The solution can be obtained elementwise (parikh2014proximal, ) as


4.3.2. Solution for norm

If the function takes the norm, (13) has the following form:


By ‘block soft thresholding’ operator (parikh2014proximal, ), the solution is given by


4.4. step

In this step, we mainly solve (14). It can be rewritten as




The function takes different forms according to the value. If , obtains its minimum value when the classification of is changed to the target label . If , achieves its minimum when the classification is kept as the original label .

Motivated by the linearized ADMM (gao2017first, ; liu2017linearized, , Sec. 2.2), we replace the function with its first-order Taylor expansion plus a regularization term (known as Bregman divergence), , where is a pre-defined positive definite matrix , and . (14) can then be reformulated as:


Letting , the solution can be obtained through


5. Experimental Evaluations

We demonstrate the experimental results of the proposed fault sneaking attack on two image classification datasets, MNIST (Lecun1998gradient, ) and CIFAR-10 (Krizhevsky2009learning, )

. We train two networks for MNIST and CIFAR-10 datasets, respectively, sharing the same network architecture with four convolutional layers, two max pooling layers, two fully connected layers and one softmax layer. They achieve 99.5% accuracy on MNIST and 79.5% accuracy on CIFAR-10, respectively, which are comparable to the state-of-the-arts. The experiments are conducted on machines with NVIDIA GTX 1080 TI GPUs.

5.1. Layer and Type of Parameters to Modify

Total Parameters norm
S=1,R=1 S=4,R=4 S=16,R=16
The first FC layer 205000 14016 40649 120597
The second FC layer 40200 5390 14086 34069
The last FC layer 2010 222 682 1755
Table 1. norm of DNN parameter modifications (i.e., the number of modified parameters) in different fully connected layers for MNIST.
S=1, R=1 S=2, R=2 S=4, R=4 S=8, R=8
norm for weight params. 236 458 715 1644
Success rate for weight params. 100% 100% 100% 100%
norm for bias params. 2 4 -* -*
Success rate for bias params. 100% 100% 0% 0%
  • There is no need to show the norm if it can not succeed.

Table 2. norm and attack success rate when modifying different types of parameters in the last fully connected layer for MNIST.

The DNN model used has three fully connected (FC) layers. We modify the parameters in different FC layers. We show the norm (i.e., the number of parameter modifications) achieved by the fault sneaking attack when we modify each FC layer in Table 1. We observe that more parameters are needed to be modified with increasing and . Besides, changing the last FC layer requires fewer parameter modifications compared with the first or second FC layer. The reason is that the last FC layer has more direct influence on the output, leading to smaller number of modifications by the fault sneaking attack. Therefore, in the following experiments, we focus on modifying only the last FC layer parameters.

Next we determine the type of parameters to modify that is more effective to implement the fault sneaking attack. In the FC layer, the output depends on the weights and the biases , that is, where is the input of the layer. As we can see, the bias parameters are more directly related to the output than the weight parameters. We show the norm and the attack success rate if we only modify the weight parameters or the bias parameters in the last FC layer in Table 2. As the bias parameters are more directly related to the output, it usually needs to change fewer bias parameters to achieve the same attack objective. However, only changing bias parameters has very limited capability which can only lead to the misclassification of 1 or 2 images. As observed from Table 2, changing the classification of 4 or more images would be beyond the capability of modifying bias parameters only. This demonstrates the limitation of the single bias attack (SBA) scheme in (liu2017fault, ), which only modifies the bias to misclassify only one image. Also we find that SBA can not be extended to solve the case of multiple images with multiple target labels. Considering the limitation of only modifying bias parameters, we choose to perturb both the weight and bias parameters in the following experiments.

5.2. Norm of Parameter Modifications

We demonstrate the number of parameter modifications, i.e., the norm, by the fault sneaking attack in this section. As observed from Fig. 1 and 2, for the same , the norm of parameter modifications keeps increasing as increases since more parameters need to be modified to change the classifications of more images into their target labels.

Figure 1. norm of DNN parameter modifications in the last fully connected layer for MNIST.
Figure 2. norm of DNN parameter modifications in the last fully connected layer for CIFAR-10.

We have an interesting finding that when is in the range of , the norm tends to be smaller as increases from 200 to 1000 for MNIST. The reason is that larger means the labels of more images () need to be kept unchanged, then the modified model should be more similar to the original model and therefore fewer modifications are required.

We also notice that this phenomenon disappears when is larger than 8 for MNIST or for CIFAR-10. Considering the 99.5% and 79.5% accuracy on MNIST and CIFAR-10, we believe the disappearance is related to the DNN model capability. When is small on MNIST, the DNN model is able to hide a small number of misclassifications by modifying only a few parameters of the last FC layer. However, when is relatively large, it is not that easy to hide so many misclassifications and the fault sneaking attack has to perturb almost all parameters in the last FC layer without extra ability to spare. The reason for CIFAR-10 is similar since the capability of the model for CIFAR-10 is limited, with only 79.5% accuracy.

5.3. Comparison of and based Attacks

In problem (10), the or norm can be minimized, leading to the corresponding or based fault sneaking attacks. Table 3 compares the and norms of the and based attacks for various configurations.

S=1, R=10 S=5, R=10 S=5, R=20
norm norm norm norm norm norm
attack 1026 863 1208 804 1606 498
attack 1431 393 1432 344 1964 226
Table 3. and norms of DNN parameter modifications in the last fully connected layer for the and based attacks for MNIST.

As seen from Table 3, the based attack achieves smaller norm than the based attack with larger norm, due to the reason that the based attack tries to minimize the Euclidean distance between the perturbed and original model without considering the number of parameter modifications.

5.4. Test Accuracy after Parameter Modification

As the fault sneaking attack perturbs the DNN parameters to satisfy specific attack requirements, it is important to measure the influence of the attack beyond the required objective. In the problem formulation, we try to reduce the influence of fault sneaking attack by enforcing the rest images to have unchanged classifications. In Table 4, we show the test accuracy on the whole testing datasets for MNIST and CIFAR-10 after perturbing the model.

Dataset Test Acc. S=1 S=2 S=4 S=8 S=16
MNIST R=50 85.2% 73.1% 64.7% 37.4% 29.7%
R=100 96.9% 86.6% 81.3% 76.1% 65.2%
R=200 96.7% 96.1% 95.4% 93.2% 92.6%
R=500 98.6% 98.5% 97.8% 96.9% 95.9%
R=1000 98.7% 97.9% 98.1% 96.8% 96.9%
CIFAR R=50 57.7% 52.9% 44.9% 26.2% 18.3%
R=100 67.5% 68.7% 55.8% 42.5% 31.5%
R=200 72.3% 67.6% 69.6% 57.2% 35.4%
R=500 78.5% 77.4% 76.2% 74.5% 73.2%
R=1000 78.5% 78.2% 77.5% 77.9% 76.4%
Table 4. Test accuracy after DNN parameter modifications for MNIST and CIFAR.

The test accuracy of the original model is 99.5% for MNIST and 79.5% for CIFAR. As observed from Table 4, with fixed , the test accuracy on the modified model decreases as increases. This demonstrates that as a nature outcome, changing parameters to misclassify certain images may downgrade the overall accuracy performance of the model. In the case of and , the test accuracy drops from 99.5% to 29.7% for MNIST and from 79.5% to 18.3% for CIFAR. However, we observe that as increases, the test accuracy keeps increasing for fixed . It demonstrates that keeping the labels of the images unchanged helps to stabilize the model and reduce the influence of changing the labels of the images. In the case of , if is increased from 50 to 1000, the test accuracy on the 10,000 test images increases from 29.7% to 96.9% for MNIST and from 18.3% to 76.4% for CIFAR. The fault sneaking attack can achieve classification accuracy as high as 98.7% and 78.5% in the case of and for MNIST and CIFAR, which only degrades the accuracy by 0.8 percent and 1.0 percent respectively, from the original models. Note that under the same assumption of misclassifying only one image, (liu2017fault, ) degrades the accuracy by 3.86 percent and 2.35 percent, respectively, for MNIST and CIFAR in the best case. Compared with (liu2017fault, ), the proposed attack achieves a great improvement to reduce the influence of model perturbation.

Figure 3. Fault sneaking attack success rate of the images after DNN parameter modifications for MNIST and CIFAR.

5.5. Tolerance for Sneaking Faults

One objective of fault sneaking attack is to hide faults by perturbing the DNN parameters. In the experiments, we found that in the case of large , not all of the images are changed to their target labels successfully. We define the success rate of the images as the percentage of images successfully changed their labels to the target labels within the images. We show the success rate of the images with various and configurations in Fig. 3. We observe that the success rate keeps almost 100% if is smaller than 10. When is larger than 10, the success rate would drop as increases. Besides, the number of successful injected faults in is usually around 10 for different configuration of . This demonstrates a limitation of changing the classifications of certain images by modifying DNN parameters. The DNN model has a tolerance for the sneaking faults - 10 successful misclassifications by modifying the last FC layer.

6. Conclusion

In this paper, we propose fault sneaking attack to mislead the DNN by modifying model parameters. The and norms are minimized by the general framework with constraints to keep the classification of other images unchanged. The experimental evaluations demonstrate that the ADMM based framework can implement the attacks stealthily and efficiently with negligible test accuracy loss.

7. Acknowledgement

This work is supported by Air Force Research Laboratory FA8750-18-2-0058, and U.S. Office of Naval Research.