T-BFA: Targeted Bit-Flip Adversarial Weight Attack

07/24/2020 ∙ by Adnan Siraj Rakin, et al. ∙ Arizona State University 0

Deep Neural Network (DNN) attacks have mostly been conducted through adversarial input example generation. Recent work on adversarial attack of DNNweights, especially, Bit-Flip based adversarial weight Attack (BFA) has proved to be very powerful. BFA is an un-targeted attack that can classify all inputs into a random output class by flipping a very small number of weight bits stored in computer memory. This paper presents the first work on targeted adversarial weight attack for quantized DNN models. Specifically, we propose Targeted variants of BFA (T-BFA), which can intentionally mislead selected inputs to a target output class. The objective is achieved by identifying the weight bits that are highly associated with the classification of a targeted output through a novel class-dependant weight bit ranking algorithm. T-BFA performance has been successfully demonstrated on multiple network architectures for the image classification task. For example, by merely flipping 27 (out of 88 million) weight bits, T-BFA can misclassify all the images in Ibex class into Proboscis Monkey class (i.e., 100 rate) on ImageNet dataset, while maintaining 59.35 ResNet-18.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, deep neural networks (DNNs) have achieved tremendous success in a wide variety of applications, including image classification [12, 23], speech recognition [8, 6] and machine translation [17, 16]. Unfortunately, DNN models are not secure and the vulnerability of DNN models has been exposed by [18, 4] in their works on adversarial input example generation.

Recently, adversarial weight attacks have been added to the the security challenge of DNN models. Memory fault injection techniques, e.g., Laser Beam Attack [22] and Row-Hammer Attack (RHA) [11, 21], can inject faults into a computer main memory (i.e., DRAM). In comparison to adversarial input attacks that require designing noise to be injected into each input separately, an adversarial weight attack requires modifying the model only once to achieve the desired attack output for the whole input set. As shown in Fig. 1, the DNN weights stored in the main memory can be modified by advanced memory bit attack algorithms [14, 9, 19] to degrade DNN performance. Further, advanced computer side-channel attacks [26, 25] have successfully demonstrated that a malicious attacker can extract DNN parameters and launch an adversarial weight attack. Several adversarial weight attacks have been proposed in recent years [9, 14, 19, 27]. Among them, the memory bit-flip based adversarial un-targeted weight attack in [19] is the strongest. It has been proven to degrade a fully-functional ResNet-18 test accuracy on the ImageNet dataset to 0.1% with only 13 bit-flips (out of 93 million bits).

Figure 1: Demonstration of Row-Hammer Attack (RHA) on the identified vulnerable bits with three distinct types of targeted attack objective proposed in T-BFA.

Existing un-targeted bit-flip based adversarial weight attacks reduce the overall prediction accuracy of a DNN model. Targeted attacks [3, 2] pose an even greater threat for the following reasons: First, it gives the attacker precise control on the malicious objective and behavior. Second, a carefully crafted targeted attack objective can cause a devastating effect on the DNN output. For example, for self-driving car applications, targeting a stop sign to be miss-classified to a high-speed limit sign while keeping the accuracy of all other signs intact can cause very serious damage.

All the existing targeted attacks in the adversarial weight attack domain either fail to perform the attack effectively i.e., requiring a large number of weight modifications [28], or are evaluated on a full-precision DNN [9, 14]. A DNN with full precision weights is easier to attack; it can be made to malfunction by just flipping the exponent bits compared to quantized DNN weights which naturally noise resilient. However, weight quantization is becoming a must-optimization for optimal efficiency in most computing platforms, such as Google’s TPU [10].

In this work, we propose Targeted Bit-Flip Attack (T-BFA), the first work on targeted adversarial weight attack of weight-quantized DNNs. We consider three variants of T-BFA as shown in the right panel of Fig. 1: N-to-1 where inputs from N source classes are mapped to 1 target class, 1-to-1 where inputs from 1 source class are mapped to 1 target class, and 1-to-1 (stealthy) where not only are inputs from 1 source class mapped to 1 target class but also the accuracy of the other class classifications are kept unchanged as much as possible. The 1-to-1 stealthy attack is particularly vicious since DNN users may not aware that an attacker has hijacked the network. The contributions of this work can be summarized as follows:

  • [leftmargin=*]

  • Our proposed T-BFA can break the noise resilience of a quantized DNN through N-to-1 (I), 1-to-1 (II), and 1-to-1 stealthy (III) adversarial weight attacks by flipping a very small number of weight bits stored in computer memory.

  • To achieve the desired targeted attack objective, we formulate three distinct loss functions associated with each type of attack. We propose a novel iterative searching algorithm that can successfully minimize these loss functions to locate vulnerable weight bits that are associated with a target class.

  • We evaluate T-BFA on a wide range of network architectures (e.g., ResNet, VGG and MobileNet-V2) for image classification using CIFAR-10 and ImageNet datasets. The experiments on ResNet-18 [7] using ImageNet dataset show that our proposed T-BFA can achieve 100% attack success rate in miss-classifying all images in the ‘Ibex’ class into ‘Proboscis Monkey’ class with only 27 bit-flips while keeping the test accuracy for other class images at 59.35%.

  • Finally, we present an analysis of three T-BFA schemes on different DNN architectures with varying capacities and quantization levels. Such analysis provides key insights in deriving effective defense strategies against T-BFA attacks.

2 Background and Related Work

Bit-Flip Attack.

The recent developments in memory fault injection attacks [11, 1] have made it feasible to conduct an adversarial weight attack for a DNN model running on a computer. Among them, a row-hammer attack [11] on Dynamic Random Access Memory (DRAM) is the most popular one since it can create a profile of memory bits stored inside the main memory (i.e., DRAM) and flip any bit of a given target address. The first few works that exploited row-hammer to attack DNN weights flipped the Most Significant Bits (MSB bits) of DNN parameters, such as the bias [14] or weight [9], and changed them to a significantly large value, thus degrading accuracy. However, those attacks were only evaluated on a model with full precision (i.e. floating point) parameters and failed in DNNs with quantized parameters.

A major milestone in adversarial weight attack is the work in [19] which implemented a stronger version of a bit-flip attack on an 8-bit fixed-point quantized network. In this work [19], BFA searches for the weight bits iteratively to gradually decrease DNN accuracy. However, the BFA design in [19] is for an un-targeted attack. Even though it succeeds in hampering the overall test accuracy, it fails to degrade the accuracy of a targeted class.

Targeted Attack.

A targeted attack has more precise control on the miss-classification behavior and can cause higher calamity. It is a well-investigated technique in adversarial input attack domain [2, 3, 18]. Here the attacker finds additive noise that decreases the loss function w.r.t a false target label for each image separately. Another form of targeted attack is the Trojan attack [5, 15]. Inserting a Trojan requires modifying the weights through re-training of the network and the attacker’s access to the training facility (e.g., supply chain). Even though recently developed targeted bit Trojan attack [20] can inject Trojans into DNN during run-time using only 84 bit-flips, it still requires the help of an input trigger. Apart from Trojan attacks, recent adversarial model parameter attacks can also perform a targeted attack without requiring a trigger [28, 14]. Again, some of them [28] require a larger value of norm (i.e., 900) for weight perturbation. Also, these attacks [28, 14] have been evaluated on a full-precision model which has been reported in [9, 19] of being easier to attack.

Threat Model. In this work, we follow the standard white-box attack threat model assumption similar to the previous bit-flip based adversarial weight attacks [19, 20]. Our threat model assumes that the attacker has access to model weights, gradients and a portion of test data to perform the attack. Such an assumption is valid since previous works have demonstrated an attacker can effectively steal similar information (i.e., layer number, weight size, and parameters) through side-channel attacks [26, 25]. Finally, we assume that the attacker is denied access to any form of training information (i.e., training dataset, hyper-parameters) to conduct the attack.

3 Targeted Bit-Flip Adversarial Weight Attack

3.1 Overview of proposed T-BFA Variants

We propose Targeted Bit-Flip adversarial weight Attack (T-BFA) that results in mis-classification of inputs from their source category/categories (i.e., ground-truth) to the target category, via a small number of malicious bit-flips on the quantized weight-bits of pre-trained DNN models. As depicted in Fig. 1, we propose three types of T-BFA with varying input constraints (e.g., number of source categories to be considered), which are elaborated as follows:

  • [leftmargin=*]

  • Type-I: N-to-1 Attack. Given that the input data belongs to one of -classes, the attack objective of this T-BFA variant is to force the entire dataset with all classes (as source classes) to one adversary-selected target class. The objective function used in conventional model training is converted to the T-BFA malicious one (LHS and RHS in Eq. 1 respectively) and expressed as follows:

    (1)

    where

    is the weight tensor set of the DNN model and

    is its quantized counterpart (i.e., weight-bit tensor set). Given vectorized input

    , computes the DNN inference output. denote the cross-entropy loss between DNN inference output and labels. and are input data and its corresponding ground-truth label. For this attack, the ground-truth label term of source category111 is the notation of one-hot code vector with a 1 at position . is tampered to the selected -indexed target category .

  • Type-II: 1-to-1 Attack. In this T-BFA variant, the adversary focuses on the mis-classification of input data of single -indexed source category into the -indexed target category (), without caring about the impact on the remaining categories . It can be described as:

    (2)
  • Type-III: 1-to-1 Stealthy Attack. In addition to the type-II 1-to-1 attack described above, the stealth version has two objectives: 1) All the input data from -indexed category are classified into -indexed target category which is the same as Eq. 2; 2) Maintaining correct predictions of the data excluded from source category . These two objectives can be achieved via the optimization of the two corresponding loss terms in the RHS of the following objective function:

    (3)

    where returns 1 if the condition is true, 0 otherwise.

In general, the three T-BFA variants mainly show the attack objective in terms of source and target categories. Another critical optimization constraint is using limited number of malicious bit-flips on weight bits to achieve the objectives in Eqs. 3, 2 and 1. This could be viewed as a joint-optimization andrepresented by:

(4)

where is the Hamming-distance between the weight-bit tensors prior to () and after () the attack. Instead of applying as an additional loss term in Eqs. 3, 2 and 1 to form single multi-objective function, we follow the optimization flow adopted by the un-targeted BFA [19], with several T-BFA-specific modifications; the details are given in the following subsection.

3.2 Optimization of T-BFA Variants

The optimization of T-BFA can be generally described as an iterative process, wherein each iteration, only single weight-bit is identified followed by the malicious bit-flip. In the -th iteration of T-BFA optimization, the objective function in Eq. 4 can be rewritten as:

(5)

where the single bit-flip is highlighted by defining inter-iteration Hamming distance as 1. To minimize with single bit-flips per iteration, we inherit and modify the intra- and inter-layer bit search method proposed in the un-targeted BFA scheme [19]. Given a DNN model with layers (e.g., convolution layers), the intra-layer bit search is to identify one weight-bit per layer and traverse through all layers, thus returning weight-bit candidates. Then, the following inter-layer search finds one weight-bit out of weight-bit candidates brought up by the intra-layer search. We describe the intra- and inter-layer search in iteration- in the following paragraphs.

Intra-layer Bit Search.

For layer indexed by , the intra-layer bit search is to identify weight-bit candidate w.r.t two criteria: 1) identifying the bit with the higest gradient; 2) can be (possibly) flipped along the direction of bit-gradient222In the intra-layer search of un-targeted BFA [19], the weight-bit can be flipped along the opposite direction of bit-gradient, as it performs maximization (instead of minimization used in this work).. These two criteria can be mathematically described as:

(6)

where is the mask that indicates the location of the identified bit within weight-bit tensor and its value . is the clamping function with 0 and 1 as lower and upper bound. The intra-layer bit search traverses through all the layers to generate the weight-bit candidate set, . Meanwhile, for each weight-bit candidate in , the corresponding T-BFA loss is profiled after the identified weight-bit has been flipped.

Inter-layer Bit Search.

Based on the intra-layer search outcomes (i.e., ), the inter-layer search performs straight-forward comparison to identify the weight-bit candidate with minimum profiled loss as the weight-bit to attack in iteration-. This process can be expressed as follows:

(7)

By applying above optimization method, we can successfully achieve the objective specified in Eq. 4.

4 Experimental Setup

Dataset configuration for attack.

In this work, we evaluate our attacks on two vision datasets: CIFAR-10 [12] and ImageNet [13]. In Table 1, we provide an overview of data division to conduct each type of attack. To conduct an N-to-1 attack on CIFAR-10 and ImageNet, we randomly choose a test batch from the test dataset. However, to evaluate 1-to-1 or 1-to-1(S) attack, we require a subset of source class (

) test images. Since the CIFAR-10 dataset has 1k images in each class, we use 500 images to perform the attack and the remaining 500 images for evaluating the Attack Success Rate (ASR). Since the ImageNet dataset has only 50 images per class, we conduct the attack using 25 images from the source class and use the remaining 25 images for evaluating ASR. Furthermore, for ImageNet, we always evaluate test accuracy on the whole test dataset of 50k images because the amount of test data used to perform the attack (e.g., 50) is negligible compared to 50k test images. The mean and standard deviation numbers are calculated over 5 trial runs for CIFAR-10 and 3 trial runs for ImageNet. Also, we terminate attacks when the ASR reaches higher than 99.99% or does not change for three successive iterations.

Metrics
# of Data to
conduct attack
# of Data to
evaluate ASR
# of Data to
evaluate Test acc.
# of Data to
conduct attack
# of Data to
evaluate ASR
# of Data to
evaluate Test acc.
Dataset CIFAR-10 ImageNet
N-to-1 128 10k 10k 50 50k 50k
1-to-1 500( ) 500() 9k 25() 25() 50k
1-to-1 (S) 500()+500() 500() 8.5k 25()+25 () 25() 50k
Table 1: Test data splitting to conduct targeted attack from source class to target class . CIFAR-10 data has 10k test images with each class containing 1000 test images and the ImageNet dataset has 50k test samples with each class containing 50 images. Note: () means images belong to any other class apart from the source class.

Evaluation Metrics

Two metrics are used in this work for attack evaluation: Post-attack accuracy, which is the inference accuracy on test subset, and Attack Success Rate (ASR) defined by the percentage of source class images correctly classified into target class via T-BFA.

Class 0 1 2 3 4 5 6 7 8 9 Average
ResNet-20 4.0 0 4.6 0.9 5.0 2.2 6.2 2.3 4.6 0.9 5.2 1.6 6.8 1.9 4.4 1.7 5 2.2 4.8 1.8 5.1
VGG-11 3.0 0.0 3.0 0.0 3.0 0.0 3.0 0.0 2.8 0.4 2.0 0.0 3.0 0.0 3.2 0.4 3.0 0.0 3.0 0.0 3.0
Table 2: N-to-1 Attack: number of bit-flips (meanstd) required to classify all the input images to a corresponding target class with 100% ASR. In each case, test accuracy drops to 10%.
Figure 2: Type II: 1-to-1 attack on ResNet-20 between source class and target class. The left subplot shows post attack test accuracy and the right subplot shows average number of bit-flips required for the attack.
Figure 3: Type III: 1-to-1 (S) attack post attack test accuracy, attack success rate and avg. # of bit-flips for five rounds of attacks for both Resnet-20 and VGG-11 Networks.

5 Results

In this section, we present the experimental results of our T-BFA attack against all three versions of targeted attack on both CIFAR-10 and ImageNet datasets.

5.1 CIFAR-10 Results

N-to-1 Attack. For CIFAR-10, the proposed N-to-1 attack can successfully reach 100% ASR for both VGG-11 and ResNet-20 architectures on each target class. As shown in Table 2, the range of average bit-flips required to achieve 100% ASR is between and for ResNet-20 and VGG-11, respectively. So for the N-to-1 attack, VGG-11 requires a consistently fewer number of bit-flips than ResNet-20 for all the CIFAR-10 classes.

Take-Away 1.

Our analysis of the N-to-1 attack shows that there is no particular target class that is easier or more difficult to attack. Thus we conclude that the input feature patterns play a small role in resisting the attack, while the network architecture plays a more important role.

1-to-1 Attack. In this version of T-BFA, the attacker performs 1-to-1 miss-classification with fewer number of bit-flips (see Fig. 2) in comparison to the N-to-1 version (see Table 2). For most of the entries shown in Fig. 2, the 1-to-1 attack requires only 1-2 bit-flips to achieve 100%ASR with a few exceptions. Overall, for all possible combinations of classes, T-BFA successfully achieves 100% 1-to-1 miss-classification with a range of bit-flips.

Take-Away 2.

1-to-1 attack requires, in general, less bit-flips compared to N-to-1 attack. This is expected since mis-classifying all N classes are more difficult than mis-classifying just one class.

1-to-1 Stealthy (S) Attack.

Our evaluation of 8-bit quantized ResNet-20 and VGG-11 models show a 91.9% and 91.6% baseline CIFAR-10 test accuracy, respectively. As shown in Fig. 3, after the attack, the accuracy drops for both networks with ResNet-20 seeing a larger drop in accuracy. The average test accuracy after five rounds of attack is between % for ResNet-20. On the other hand, VGG-11 maintains a better test accuracy with a range of .

Our proposed T-BFA is effective in attacking ResNet-20 network by achieving ASR higher than 97% for all combinations of source and target classes. However, VGG-11 shows slightly better resistance to the attack with an ASR range of 93-99% for different combinations. This is consistent with prior work which shows that dense networks (i.e., VGG-11, VGG-16) are resistant to both adversarial weight attack [20] and input attack [18]. While for both networks, some classes are more vulnerable to bit-flip attack than others, most source class and target class combinations require less than 10 bit-flips to conduct the 1-to-1 stealthy attack.

Take-Away 3.

A compact network, like ResNet-20 with 0.27M parameters, has less capacity to learn the dual objective function in 1-to-1 (S) attack through a small number of bit-flips in comparison to denser network, like VGG-11 with 132M parameters. As a result, the test accuracy drop for a compact network, like ResNet-20, is higher.

Type
Attack
Success
Rate (%)
Test
Accuracy
(%)
# of
Bit-Flips
Attack
Success
Rate (%)
Test
Accuracy
(%)
# of
Bit-Flips
Attack
Success
Rate (%)
Test
Accuracy
(%)
# of
Bit-Flips
N-to-1 99.78 0.27 0.23 0.18 32.6 8.2 99.99 0 0.1 0 21 4 100 0 0.1 0 17.3 3.29
1-to-1 100 0 32.13 14.4 16.7 1.24 100 0 23.74 1.71 9.33 0.94 100 0 1.19 0.22 13
1-to-1 (S) 100 0 59.48 2.9 27.3 16.7 100 0 58.33 3.29 40.33 30.32 98.67 1.89 33.99 4.93 45.33 21.74
ResNet-18 (# of parameters: 11M) ResNet-34 (# of parameters: 21M) MobileNet-V2 (# of parameters: 2.1M)
Table 3: Performance of T-BFA variants on ImageNet (from Ibex class to Proboscis Monkey class). The original test accuracies of ResNet-18, ResNet-34 and MobileNet-V2 are 69.23%, 75.5% and 72.01%, respectively.

5.2 ImageNet Results

ImageNet dataset has a much larger number of output classes compared to CIFAR-10. We do not have the space to report all targeted attack results, thus we randomly pick one combination of target attack (Ibex class -¿ Proboscis Monkey class) to show our method’s efficiency. For N-to-1 attack, Table 3 shows that our method requires 32, 21 and 17.3 bit-flips, on average, for ResNet-18, ResNet-34 and MobileNet-V2, respectively. Thus a more compact network is more vulnerable to an N-to-1 attack. Now for 1-to-1 (S) attack, a compact network e.g., MobileNet-V2 (with 2.1M parameters) fails to maintain a reasonable test accuracy (i.e., 33.9%), while larger networks such as ResNet-18 and ResNet-34 help to maintain a reasonable test accuracy (i.e., ¿59%) while achieving 100% ASR.

Take-Away 4.

In the case of the ImageNet dataset, a large number of output classes increase the attack difficulty for T-BFA. However, consistent with CIFAR-10 observations, it is easier to conduct a 1-to-1 (S) attack on a network with higher capacity due to its larger optimization space that helps achieve dual objectives of maintaining reasonable test accuracy as well as achieving very high ASR.

Method
# of Images
Attacked
Attack
Success
Rate (%)
Test Accuracy
(%)
# of
Bit-Flips
Model
Precision
Untargeted-BFA (I) [19] 10k - 10.27 28 8-bit
Proposed N-to-1(I) 10k 100 10 4 8-bit
SBA (II) [14] 100 100 60.0 1 full-precision
Proposed 1-to-1 (II) 1000 100 10 3.2 8-bit
TBT (III) [20] 10k 93.89 82.03 199 8-bit
GDA (III) [14] 100 100 81.66 198 full-precision
Fault Sneaking (III) [28] 16 100 76.4 >2565 full-precision
Proposed 1-to-1 (s) III 1000 99.3 88.3 12.2 8-bit
Table 4: Comparison with Competing Methods We directly report the numbers from the respective papers for [14, 28]. For [20, 19] we run the attack on ResNet-20 8-bit quantized network.

5.3 Comparison with Other Competing Methods

Several existing targeted attacks [19, 20, 14, 28] in adversarial weight attack domain have a similar goal to our proposed targeted attack. For instance, both un-targeted progressive BFA and the proposed N-to-1 T-BFA attack try to achieve a test accuracy of 10 % after an attack.

As shown in Table 4, the proposed N-to-1 targeted attack achieves the same objective as [19] with less number of bit-flips. Other stronger versions of previous targeted attacks such as GDA [14] and fault sneaking attacks [28] have shown superior results (100% ASR) against a weaker threat model (i.e., full-precision model or attacking only 100 images). However, the proposed T-BFA 1-to-1 (s) outperforms both [14],[28] on a quantized network with and fewer number of bit-flips. In the family of Neural Trojan works [15, 5, 20], Targeted Bit Trojan (TBT) follows a more strict threat model and performs the attack with the least number of bit-flips. Our 1-to-1 (s) proves to be much more effective than TBT; it achieves a higher test accuracy and higher ASR with fewer bit-flips.

6 Discussion

Effect of Network width and precision. We perform two ablation studies to analyze the effects of model compression and network capacity. In Table 5, we show the results of our attack for quantized networks with weights represented by 2,4,6,8 bits. The performance of N-to-1 and 1-to-1 attack is slightly weaker for a low bit-width network (e.g., 2-bit). This is expected since low bit-width networks are reported to be more resilient to small perturbations. The 2 bit-width network is also resilient to a 1-to-1 (s) attack. A preliminary study of network width indicated that increasing network channel width (2) helped recover the test accuracy from 64.0% to 89.01 % for 1-to-1 (S) attack. Such observations motivated us to explore possible future defense methods against T-BFA:

Type
Attack
Success
Rate (%)
Test
Accuracy
(%)
# of
Bit-Flips
Attack
Success
Rate (%)
Test
Accuracy
(%)
# of
Bit-Flips
8-Bit ( TA: 92.91%) 6-Bit ( TA: 92.41%)
N-to-1 (I) 100 0 10 0 6.0 2.2 100 0 10 0 6.2 0.83
1-to-1 (II) 100 0 68.45 0.08 2.2 0 100 0 68.45 0.08 2 0
1-to-1 (S) (III) 99.7 0 80.3 0 4.4 0.54 100 0 68.53 0.15 1.6 0.54
4-Bit (TA: 91.33%) 2-Bit (TA: 90.3%)
N-to-1 (I) 100 0 10 0 4.4 0.55 100 0 10 0 37.6 0.9
1-to-1 (II) 100 0 11.54 0.08 4 0 100 0 76.43 3.43 17.6 1.7
1-to-1 (S) (III) 98.56 0.7 12.79 1.88 16.4 9.36 97.87 1.75 84.6 0.91 19.6 6.6
Table 5: Result of varying the quantization bit-width on ResNet-20. For this ablation study, we chose to attack from class 0 to class 9 only. TA is Test accuracy before the attack.

Defense Strategy 1.

A network with higher model capacity (e.g., ResNet-34 or increased width) or lower quantization bit-resolution (e.g., 2 bit), resists N-to-1 attack better. As a result, a possible defense strategy against the N-to-1 attack would be to increase model capacity, network width, or to decrease the quantization level.

Defense Strategy 2.

It is difficult to achieve dual objectives of ensuring both high ASR and test accuracy after a 1-to-1 (s) attack on a compact network (e.g., MobileNet-V2). Thus for 1-to-1 (s) attack, decreasing the network capacity will limit the attacker’s ability to hide the attack (by failing to maintain the test accuracy on other classes), thereby helping to detect this attack..

Layer-wise analysis of bit-flips and critical weights. We observe that the most vulnerable layer for the T-BFA attack is the last classification layer. In the case of 1-to-1 (s) attack, 100 % of the bit-flips are in the last layer for both ResNet-20 and VGG-11 models. For the N-to-1 attack, more than 90% bit-flips are in the last classification layer. Further, to miss-classify from one source class to another target class, we see that the attack always picks one particular weight in the last layer even after running the attack with different test batches. We call this weight as critical weight in classifying images from source class to target class . In the following Table 6, we list the critical weights for classifying all the class 1 images to different classes on ResNet-20:

Target Class 0 2 3 4 5 6 7 8 9
Index value 41 169 233 297 325 425 489 553 625

Connected output neuron

0 2 3 4 5 6 7 8 9
Table 6: If the last layer weight matrix is flattened to create a 1-D array of size , then T-BFA picks the weights with the following indices in the flattened 1-D array.

This study leads to the question: Can we defend T-BFA by securing the critical weights in the last layer? To answer the question, we run a T-BFA attack by first securing the critical weights which cannot be flipped. Then, for more aggressive protection, we even secure the entire last layer. This is motivated by prior work that secures the entire last layer in a protected enclave, such as Intel SGX [24], as an effective privacy protection method. Unfortunately, for both cases, all three versions of T-BFA still succeed but requiring more number of bit flips. For example, for CIFAR10, to misclassify image from class 1 to class 2, if the critical weights are secured, then we require more number of bit-flips. If the entire last layer is secured, we require 6 more number of bit-flips.

Defense Strategy 3.

We conclude that securing critical weights or even all weights of last layer helps improve the resistance to T-BFA, but cannot completely defend it. T-BFA could find an alternative path of attack, but with a higher number of bit-flips.

7 Conclusion

We propose three targeted adversarial weight attack schemes, i.e. N-to-1, 1-to-1 and 1-to-1(stealthy), that severely degrade the classification performance of quantized DNNs. Our T-BFA is based on a novel iterative class-dependant bit ranking algorithm. We demonstrate that a compact network is more vulnerable to a N-to-1 attack while a larger network is more susceptible to the stealthy version of T-BFA. Finally, we provide possible defense strategies to make DNNs more resilient to such aggressive adversarial weight noise attacks.

References

  • [1] M. Agoyan, J. Dutertre, A. Mirbaha, D. Naccache, A. Ribotta, and A. Tria (2010) How to flip a bit?. In 2010 IEEE 16th International On-Line Testing Symposium, pp. 235–239. Cited by: §2.
  • [2] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pp. 39–57. Cited by: §1, §2.
  • [3] P. Chen, H. Zhang, Y. Sharma, J. Yi, and C. Hsieh (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In

    Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

    ,
    pp. 15–26. Cited by: §1, §2.
  • [4] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. ICLR. Cited by: §1.
  • [5] T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg (2019) Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, pp. 47230–47244. Cited by: §2, §5.3.
  • [6] M. A. Haque, A. Verma, J. S. R. Alex, and N. Venkatesan (2020) Experimental evaluation of cnn architecture for speech recognition. In First International Conference on Sustainable Technologies for Computational Intelligence, pp. 507–514. Cited by: §1.
  • [7] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 770–778. Cited by: 3rd item.
  • [8] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 29 (6), pp. 82–97. Cited by: §1.
  • [9] S. Hong, P. Frigo, Y. Kaya, C. Giuffrida, and T. Dumitraș (2019) Terminal brain damage: exposing the graceless degradation in deep neural networks under hardware fault attacks. In 28th USENIX Security Symposium (USENIX Security 19), pp. 497–514. Cited by: §1, §1, §2, §2.
  • [10] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. (2017) In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12. Cited by: §1.
  • [11] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu (2014) Flipping bits in memory without accessing them: an experimental study of dram disturbance errors. In ACM SIGARCH Computer Architecture News, Vol. 42, pp. 361–372. Cited by: §1, §2.
  • [12] A. Krizhevsky, V. Nair, and G. Hinton (2010) Cifar-10 (canadian institute for advanced research). URL http://www. cs. toronto. edu/kriz/cifar. html. Cited by: §1, §4.
  • [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    .
    In Advances in neural information processing systems, pp. 1097–1105. Cited by: §4.
  • [14] Y. Liu, L. Wei, B. Luo, and Q. Xu (2017) Fault injection attack on deep neural network. In 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 131–138. Cited by: §1, §1, §2, §2, §5.3, §5.3, Table 4.
  • [15] Y. Liu, S. Ma, Y. Aafer, W. Lee, J. Zhai, W. Wang, and X. Zhang (2018) Trojaning attack on neural networks. In 25nd Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-221, 2018, Cited by: §2, §5.3.
  • [16] Y. Lu, X. Xiong, W. Zhang, J. Liu, and R. Zhao (2020)

    Research on classification and similarity of patent citation based on deep learning

    .
    Scientometrics, pp. 1–27. Cited by: §1.
  • [17] M. Luong, M. Kayser, and C. D. Manning (2015) Deep neural language models for machine translation. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pp. 305–309. Cited by: §1.
  • [18] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2, §5.1.
  • [19] A. S. Rakin, Z. He, and D. Fan (2019) Bit-flip attack: crushing neural network with progressive bit search. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1211–1220. Cited by: §1, §2, §2, §2, §3.1, §3.2, §5.3, §5.3, Table 4, footnote 2.
  • [20] A. S. Rakin, Z. He, and D. Fan (2020) TBT: targeted neural network attack with bit trojan. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Cited by: §2, §2, §5.1, §5.3, §5.3, Table 4.
  • [21] K. Razavi, B. Gras, E. Bosman, B. Preneel, C. Giuffrida, and H. Bos (2016) Flip feng shui: hammering a needle in the software stack. In 25th USENIX Security Symposium (USENIX Security 16), pp. 1–18. Cited by: §1.
  • [22] C. Roscian, A. Sarafianos, J. Dutertre, and A. Tria (2013) Fault model analysis of laser-induced faults in sram memory cells. In 2013 Workshop on Fault Diagnosis and Tolerance in Cryptography, pp. 89–98. Cited by: §1.
  • [23] Y. Sun, B. Xue, M. Zhang, and G. G. Yen (2019) Evolving deep convolutional neural networks for image classification.

    IEEE Transactions on Evolutionary Computation

    .
    Cited by: §1.
  • [24] F. Tramer and D. Boneh (2019) Slalom: fast, verifiable and private execution of neural networks in trusted hardware. In International Conference on Learning Representations, External Links: Link Cited by: §6.
  • [25] Y. Xiang, Z. Chen, Z. Chen, Z. Fang, H. Hao, J. Chen, Y. Liu, Z. Wu, Q. Xuan, and X. Yang (2020) Open dnn box by power side-channel attack. IEEE Transactions on Circuits and Systems II: Express Briefs. Cited by: §1, §2.
  • [26] M. Yan, C. Fletcher, and J. Torrellas (2018) Cache telepathy: leveraging shared resource attacks to learn dnn architectures. arXiv preprint arXiv:1808.04761. Cited by: §1, §2.
  • [27] F. Yao, A. S. Rakin, and D. Fan (2020) DeepHammer: depleting the intelligence of deep neural networks through targeted chain of bit flips. arXiv preprint arXiv:2003.13746. Cited by: §1.
  • [28] P. Zhao, S. Wang, C. Gongye, Y. Wang, Y. Fei, and X. Lin (2019) Fault sneaking attack: a stealthy framework for misleading deep neural networks. In 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. Cited by: §1, §2, §5.3, §5.3, Table 4.