RED-Attack: Resource Efficient Decision based Attack for Machine Learning

01/29/2019 ∙ by Faiq Khalid, et al. ∙ 0

Due to data dependency and model leakage properties, Deep Neural Networks (DNNs) exhibit several security vulnerabilities. Several security attacks exploited them but most of them require the output probability vector. These attacks can be mitigated by concealing the output probability vector. To address this limitation, decision-based attacks have been proposed which can estimate the model but they require several thousand queries to generate a single untargeted attack image. However, in real-time attacks, resources and attack time are very crucial parameters. Therefore, in resource-constrained systems, e.g., autonomous vehicles where an untargeted attack can have a catastrophic effect, these attacks may not work efficiently. To address this limitation, we propose a resource efficient decision-based methodology which generates the imperceptible attack, i.e., the RED-Attack, for a given black-box model. The proposed methodology follows two main steps to generate the imperceptible attack, i.e., classification boundary estimation and adversarial noise optimization. Firstly, we propose a half-interval search-based algorithm for estimating a sample on the classification boundary using a target image and a randomly selected image from another class. Secondly, we propose an optimization algorithm which first, introduces a small perturbation in some randomly selected pixels of the estimated sample. Then to ensure imperceptibility, it optimizes the distance between the perturbed and target samples. For illustration, we evaluate it for CFAR-10 and German Traffic Sign Recognition (GTSR) using state-of-the-art networks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 5

page 6

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks (DNNs) have revolutionized the computing paradigms but due to their inherent security vulnerabilities, i.e., training or inference data poisoning (Khalid et al., 2018), backdoors (Biggio & Roli, 2018; Gu et al., 2017), Trojans (Li et al., 2018; Zou et al., 2018; Chen et al., 2018), etc., they can lead to catastrophic effects in safety-critical applications, e.g., in autonomous driving (Stilgoe, 2018). Several security attacks have been proposed that exploit these security vulnerabilities, however, the adversarial attacks have emerged as one of the most common and successful class of security attacks against DNNs and can be defined as the carefully-crafted imperceptible data corruptions to fool DNNs for misclassification (Goodfellow et al., 2014). The implementation and effectiveness of these attacks depend upon the underlying assumption for attacker’s access to the DNNs, i.e., white-box and black-box scenarios.

Figure 1: Threat Models: Different black-box attack and defense scenarios while assuming the black-box model and access to output classification probability vectors/distribution or labels
Figure 2: Adversarial Examples generated using Decision-based Un-targeted Attacks on German Traffic Sign Recognition benchmarks for the 10000 iterations. (Perturbation: strength of Adversarial noise, SSI: Structural Similarity Index; CR: Correlation Index )

White-Box Attacks: Most of the state-of-the-art attacks assume the white-box scenario in which the adversary has full knowledge of the DNN architecture and the corresponding parameters (weights and biases), e.g., Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014), iterative FGSM (I-FGSM) (Kurakin et al., 2016), Jacobian Saliency Map Attack (JSMA) (Papernot et al., 2015), Carlini-Wagner (CW) Attack (Carlini & Wagner, 2017), One-Pixel Attack, Universal Adversarial Perturbation, DeepFool, PGD-Attack, etc. Although these attacks can generate imperceptible adversarial examples efficiently, the underlying assumption of access to the complete DNN model is impractical in most of the scenarios.

Black-Box Attacks: Unlike white-box attacks, black-box attacks are not based on the aforementioned assumption and, therefore, limits the capability of the adversary because of the limited knowledge about the target system. Based on the available information in black-box scenarios, following can be the three possible threat models:

  1. Threat Model I: Adversary has access to the output probability vector/distribution while does not have access to the model parameters, as shown in Fig. 1(a).

  2. Threat Model II:

    Adversary has access only to the probability of the most probable class detected by the DNN classifier (in case of a classification problem), as shown in Fig.

    1(b).

  3. Threat Model III: Adversary has access only to the final output of the system, i.e., the final class label in case of the a classification, as shown in Fig. 1(c).

Most of the state-of-the-art black-box attacks assume threat models I or II, e.g., Zeroth-Order Optimization-based Attack (Chen et al., 2017) and Gen-Attack (Alzantot et al., 2018). However, these attacks can be mitigated by concealing the information about the classification probability, i.e., by considering Threat Model III (see Fig. 1(c)). To address this limitation, decision-based attack (Brendel et al., 2017) has been proposed which utilizes the random search algorithm to estimate the classification boundary. Though decision-based attack can nullify the probability concealing-based defense, it possess the following limitations:

Figure 3: Adversarial Examples generated by the Decision-based attack with maximum iteration = 1000. This analysis shows that by reducing the number of iteration (limited resources), e.g., the perceptibility of the images increases, 1.51 to 94.9. Where
  1. It requires multiple reference samples to generate a single adversarial example and thereby requires high memory/bandwidth resources.

  2. The technique requires a large number of inferences/queries to generate a single adversarial example, for example, on average around 10000 for the examples illustrated in Fig. 2.

  3. In case of restricted number of allowed iterations (where each iteration can have multiple queries), the performance of the decision-based attack can reduces significantly, as illustrated in Fig. 3)

Associated Research Challenge: Based on the above-mentioned limitations, we can safely conclude that decision-based attack cannot be applied to a resource and energy constraint systems, e.g., autonomous vehicles. Thus it raises a key research questions: How to reduce the number of queries for generating an adversarial example while maintaining the imperceptibility (maximizing the correlation coefficient and structural similarity index)?

1.1 Novel Contributions

To address the above-mentioned research challenge, in this paper, we propose Resource-Efficient Decision-based methodology to generate an imperceptible adversarial attack under the threat model III. The underlying assumption for the proposed methodology is that it takes the pre-processed image and its corresponding class label from the black-box model. Then it iteratively generates the adversarial image multiple time, computes the corresponding classification label for each iteration and optimizes it by finding the closet adversarial image (on classification boundary) from input image, as shown in Fig. 4. In a nutshell, this paper make the following contributions:

Figure 4: The proposed RED-Attack during typical design cycle for ML-based systems under the Threat Model III
  1. To reduce the number of queries, we propose a single reference sample-based half-interval search algorithm to estimate the classification boundary.

  2. To maximize the imperceptibility, we propose an optimization algorithm which combines the half-interval search algorithm with gradient estimation to identify the closest adversarial example (on boundary) from the input image.

To illustrate the effectiveness of the RED-Attack, we evaluate it for the CIFAR-10 and the GTSR dataset using state-of-the-art networks. The comparative analysis of results shows that the proposed approach successfully generates adversarial examples with much less perceptibility as compared to Decision-based Attack. We empirically show that, on average, the perturbation norm of adversarial images from their corresponding source images is decreased by 96.1 %, while their SSI and CC with respect to the corresponding clean images is increased by 71.7 % and 203 %, respectively.

2 Proposed Methodology

The proposed resource-efficient decision based attack methodology consists of two major steps: classification boundary estimation and optimization of attack image with respect to maximum allowed perturbations .

Figure 5: RED-Attack: Resource-Efficient Decision based methodology to generate on imperceptible attack while assuming the Threat Model III.
Figure 6: Proposed methodology to generate the optimized imperceptible hyper-sphere attacks
  Inputs:
    = Source image;
    = Sample Image from the target class;
    = Maximum Allowed Perturbation;
  Output:
    = Adversarial Image;
  Select a random sample Adversarial Image ()
  Compute ;
  Compute ;
  repeat
     if   then
        Compute
     else
        Compute
     end if
     Compute
     Compute ;
     Compute ;
  until 
Algorithm 1 Boundary Estimation

2.1 Boundary Estimation

The first step is to estimate the boundary for which we propose a half-interval search based algorithm that requires a target image and a reference image from any other class, as shown in Algorithm 1 (See Step 1 of the Fig. 6).

Definition 1

Let , and be the source image (class: A), reference image (class: other than A) and maximum allowed estimation error. The goal of this algorithm is to find a sample which has tolerable distance (less than ) from the classification boundary and has label different from that of the source image, mathematically it can be defined as:

To generate the appropriate , the proposed algorithm perform the half-interval search using the source image () and the reference image (). It first finds the half way point between and by computing the average of the two and then replaces or with depending upon the class in which falls. For examples, if the label of the half way point is class A then algorithm replaces with and if its label is not A then the algorithm replaces with . The algorithm repeats this process until the maximum distance of from the is less than and while ensuring the . The proposed boundary estimation can be used for targeted attack if we choose the reference image from the target class.

  Inputs:
    = Source image;
    = Sample Image from the target class;
    = Adversarial Image;
    = Number of pixels to perturb;
    = Relative Perturbation in each pixel;
  Output:
    = Noisy Adversarial Image;
    = Sign of the Gradient;
  Select a random sample Adversarial Image ();
  Compute ;
  Set randomly selected pixels of to the maximum value a pixel can have;
  Compute ; and Compute
  Compute ; and ;
  if   then
     Compute ;
  else if   then
     Compute ;
  else
     Compute ;
  end if
Algorithm 2 Gradient Estimation

2.2 Optimization of Attack Image

The second step of the proposed methodology is to optimize noise on the sample (output of the boundary estimation). We propose to incorporate the adaptive update in the zeroth-order stochastic algorithm to efficiently optimize it.

  Inputs:
    = Copy of the Source image;
    = Sample Image from the target class;
    = Adversarial Image;
    = Randomly Perturbed Adversarial Image;
    = Gradient Sign;
    = Maximum Jump;
  Output:
    = A new instance of adversarial example;
  Compute ; and ;
  Compute ;
  repeat
     Compute ; and ;
  until 
  if  then
     Compute ;
  end if
Algorithm 3 Efficient Update
Figure 7: Experimental setup and tool flow for evaluating the proposed RED-Attack
Definition 2

Let , , and be the source image (class: A), reference image (class: other than A), maximum allowed estimation error and perturbed image, respectively. The goal of this algorithm is to minimize distance of from while ensuring that it has label different than the source image, mathematically it can be defined as:

To achieve this goal, we first identify the sign of the gradient for estimating the boundary behavior (minima). For example, if the sign is positive then we continue moving in the same direction; otherwise we switch the direction, as shown in Algorithm 2. Once the direction is identified, the next key challenge is to select the appropriate hop size. Therefore, to address this challenge, we propose an algorithm (Algorithm 3) which introduces adaptive hop size by applying the half-interval update. For example, for updating, it starts with maximum hop size and then it reduces it by half. The algorithm repeats itself until it finds the local minima, illustrated by Step 3-5 of Fig. 11.

3 Experimental Results and Discussions

3.1 Experimental Setup

To demonstrate the effectiveness of the proposed RED-Attack, we evaluated several un-targetd attacks on CIFAR10 and GTSR for state-of-the-art networks (Fig. 7).

3.2 Experimental Analysis

The imperceptibility of the adversarial image improves iteratively by minimizing its distance from the source image, as shown in Fig. 8. Form the analysis of this figure, we identify the following key observation: The adversarial image generated in first few iterations are not imperceptible but not even recognizable but over time with help of query and optimization algorithm it achieves the imperceptibility. Thus, in black-box attack, if we limit the number of queries then its imperceptibility decreases drastically.

Figure 8: Visualizing the adversarial examples at various query counts.

3.3 Effects of Hyper-parameters on perceptibility (d)

We measure the distance of adversarial image from its source source image using the L2-Norm of perturbation which is defined as the pixel-wise sum of . We monitor the effect of changing three different hyper-parameters, introduced in our attacks, on the distance of the adversarial examples from their corresponding source examples.

3.3.1 Experimental Setup

  1. [leftmargin=*]

  2. - measures the maximum tolerable error, in terms of each pixel value, while computing the boundary point in Algorithm 1. Unless, otherwise stated, the typical value we use for is 1.

  3. - defines the number of pixels, randomly selected to be perturbed in each iteration in order to estimate the gradient of the distance of the adversarial example from the source example (Algorithm 2). Unless otherwise stated, the typical value we use for is 5.

  4. - defines the magnitude of noise added in each of the randomly selected pixels relative to the maximum value, a pixel can have. Unless otherwise stated, the typical value we use for is 5.

Figure 9: Adversarial examples generated by performing both targeted and un-targeted attacks for various values of , . The corresponding source/victim examples and the examples of the target class are given for each case. All the adversarial examples generated were successfully classified to be of the target class for targeted attacks and were mis-classified for the un-targeted attacks.
Figure 10: The trends of Perturbation norm (d) from the corresponding source examples over time for various values of , . (From left to right) Mis-classification attack on Airplane, Car, Bird and Cat. The initial target image was chosen to an image of a truck.
Figure 11: Illustrating the effects of changing on the gradients. a) Incorrect estimation for large . b) Close approximation for small

3.3.2 Experimental Analysis

Figures 9 and 10 shows the source examples that we use for targeted and un-targeted attacks along with the initial target examples and the generated adversarial examples.

  1. [leftmargin=*]

  2. As increases, the quality of the adversarial example at a given query count decreases, due to the increase in its distance from the source example. This is because the larger value of results in an imprecise boundary point, which in turn may result in an incorrect value of the estimated gradient, as shown in Figure 11. However, smaller value of results in a gradient direction, closer to the required direction (see Figure 11b).

    Figure 12: The trends of distance (d) of the adversarial example from the corresponding source examples as the algorithm progresses for various values of , on German Traffic Sign Dataset. The source image, target image and the generated adversarial image are shown above each plot.
  3. A larger value of , initially results in a faster convergence. The reason is that we only need to estimate an overall trend of the boundary at initial stages. Estimating the update direction for the adversarial example by perturbing a large number of values at once helps achieve better results. However, Large is highly vulnerable to divergence as the attack progresses. This observation suggests that the attack can be significantly improved by changing the number of pixels perturbed, as the algorithm progresses in an adaptive manner.

  4. Similar trend is observed with the changes in because estimating the gradients by introducing large perturbations in the image give an overall trend of the boundary instead of more precise localized gradients, as given by small perturbations. This in turn helps the algorithm to initially converge faster. However, small values of give a more stable convergence towards the solution.

Figure 13: Adversarial examples generated by performing un-targeted attacks on the model trained for German Traffic Sign Classification for various values of , and . The corresponding source/victim examples along with the images of another class (target examples) specifying initial direction are given for each case. All the adversarial examples generated were successfully miss-classified by the classifier. Unless otherwise stated, = 0.01, =20, = 0.0196. The maximum number of queries used to generate the images is
Figure 14: Comparison of RED-Attack for different images (see) with Decision-based Attack for various values of , and . The maximum number of allowed queries . The metric used are Correlation coefficient (CC), Perturbation Norm (Norm) and Structural Similarity Index (SSI) computed against Adversarial images and the corresponding source images.
Figure 15: Comparison of the adversarial images generated by an untargeted Decision-based attack and those generated by an untargeted RED-Attack for various values of , and . Please note that = 0.01, = 20 and = 0.0196, unless otherwise stated. The maximum number of queries is 1000 for each adversarial image.

3.4 German Traffic Sign Data set

In order to find out the generality of the trends shown in Figure 10, we perform a similar analysis on GTSR. In this experiment, we randomly select any image from the test data and generate adversarial example for fixed number of queries. We compute perturbation norm (d) from their corresponding source images and compare different approaches based on these findings.

The results are shown in Figure 12. The complete set of adversarial examples along with the corresponding source images is shown in Figure 13.

3.4.1 Key Observations and Insights

  1. [leftmargin=*]

  2. Generally, the effect of changing hyper-parameters on the perturbation norm of the adversarial example is almost similar for GTSRB dataset and CIFAR-10.

  3. We observe that the adversarial examples for untargeted attack against GTSRB classifier converge much faster as compared to the CIFAR-10 dataset. We attribute this to much larger number of classes in GTSRB dataset as compared to CIFAR-10 dataset.

  4. As was observed in case of CIFAR-10, the attack can significantly be improved by adaptively changing the hyper-parameters as the attack progresses.

4 Comparison with the state-of-the-art

We compare our results with the state-of-the art Decision-based Attack. We use the implementation provided in an open source benchmark library, FoolBox (Rauber et al., 2017). We limit the maximum number of queries to 1000 and evaluate our attack for different values of , and . To compare our results with the Decision-based Attack, we use three different metrics i.e. correlation coefficient (CC), the squared L2-Norm (Pert. Norm) and the Structural Similarity Index (SSI) of the adversarial image with respect to the source image. Our results are shown in Figure 14.

It can be seen from the figure that the adversarial examples produced by RED-Attack in this settings are significantly superior to those produced by the Decision Attack. The reason is binary stepping while searching for the boundary point and efficient update process while computing a new instance of adversarial example.

Here, we want to emphasize that in the long run, the decision-based attack eventually surpasses the RED-Attack e.g. if the query efficiency is not much of a concern or the number of maximum queries is limited to instead of , the adversarial examples found by the decision-based attacks are better than those found by the RED-Attack.

5 Conclusion

In this paper, we proposed a novel Resource-Efficient Decision-based Imperceptible Attack (RED-Attack). It utilizes a half-interval search-based algorithm to estimate the classification boundary and an efficient update mechanism to boost the convergence of an adversarial example for decision-based attacks, in query limited settings. To illustrate the effectiveness of the RED-Attack, we evaluated it for the CIFAR-10 and the GTSRB dataset using multiple state-of-the-art networks. We limited the maximum number of queries to 1000 and showed that the state-of-the-art decision-based attack is unable to find an imperceptible adversarial example, while the RED-Attack arrives at a sufficiently imperceptible adversarial example within the predefined number of queries. Further, we empirically showed that, on average, the perturbation norm of adversarial images (from their corresponding source images) decreased by 96.1 %, while their Structural Similarity Index and Correlation (with respect to the corresponding clean images) increased by 71.7 % and 203 %, respectively.

References