Deep Neural Networks (DNNs) have revolutionized the computing paradigms but due to their inherent security vulnerabilities, i.e., training or inference data poisoning (Khalid et al., 2018), backdoors (Biggio & Roli, 2018; Gu et al., 2017), Trojans (Li et al., 2018; Zou et al., 2018; Chen et al., 2018), etc., they can lead to catastrophic effects in safety-critical applications, e.g., in autonomous driving (Stilgoe, 2018). Several security attacks have been proposed that exploit these security vulnerabilities, however, the adversarial attacks have emerged as one of the most common and successful class of security attacks against DNNs and can be defined as the carefully-crafted imperceptible data corruptions to fool DNNs for misclassification (Goodfellow et al., 2014). The implementation and effectiveness of these attacks depend upon the underlying assumption for attacker’s access to the DNNs, i.e., white-box and black-box scenarios.
White-Box Attacks: Most of the state-of-the-art attacks assume the white-box scenario in which the adversary has full knowledge of the DNN architecture and the corresponding parameters (weights and biases), e.g., Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014), iterative FGSM (I-FGSM) (Kurakin et al., 2016), Jacobian Saliency Map Attack (JSMA) (Papernot et al., 2015), Carlini-Wagner (CW) Attack (Carlini & Wagner, 2017), One-Pixel Attack, Universal Adversarial Perturbation, DeepFool, PGD-Attack, etc. Although these attacks can generate imperceptible adversarial examples efficiently, the underlying assumption of access to the complete DNN model is impractical in most of the scenarios.
Black-Box Attacks: Unlike white-box attacks, black-box attacks are not based on the aforementioned assumption and, therefore, limits the capability of the adversary because of the limited knowledge about the target system. Based on the available information in black-box scenarios, following can be the three possible threat models:
Threat Model I: Adversary has access to the output probability vector/distribution while does not have access to the model parameters, as shown in Fig. 1(a).
Threat Model III: Adversary has access only to the final output of the system, i.e., the final class label in case of the a classification, as shown in Fig. 1(c).
Most of the state-of-the-art black-box attacks assume threat models I or II, e.g., Zeroth-Order Optimization-based Attack (Chen et al., 2017) and Gen-Attack (Alzantot et al., 2018). However, these attacks can be mitigated by concealing the information about the classification probability, i.e., by considering Threat Model III (see Fig. 1(c)). To address this limitation, decision-based attack (Brendel et al., 2017) has been proposed which utilizes the random search algorithm to estimate the classification boundary. Though decision-based attack can nullify the probability concealing-based defense, it possess the following limitations:
It requires multiple reference samples to generate a single adversarial example and thereby requires high memory/bandwidth resources.
The technique requires a large number of inferences/queries to generate a single adversarial example, for example, on average around 10000 for the examples illustrated in Fig. 2.
In case of restricted number of allowed iterations (where each iteration can have multiple queries), the performance of the decision-based attack can reduces significantly, as illustrated in Fig. 3)
Associated Research Challenge: Based on the above-mentioned limitations, we can safely conclude that decision-based attack cannot be applied to a resource and energy constraint systems, e.g., autonomous vehicles. Thus it raises a key research questions: How to reduce the number of queries for generating an adversarial example while maintaining the imperceptibility (maximizing the correlation coefficient and structural similarity index)?
1.1 Novel Contributions
To address the above-mentioned research challenge, in this paper, we propose Resource-Efficient Decision-based methodology to generate an imperceptible adversarial attack under the threat model III. The underlying assumption for the proposed methodology is that it takes the pre-processed image and its corresponding class label from the black-box model. Then it iteratively generates the adversarial image multiple time, computes the corresponding classification label for each iteration and optimizes it by finding the closet adversarial image (on classification boundary) from input image, as shown in Fig. 4. In a nutshell, this paper make the following contributions:
To reduce the number of queries, we propose a single reference sample-based half-interval search algorithm to estimate the classification boundary.
To maximize the imperceptibility, we propose an optimization algorithm which combines the half-interval search algorithm with gradient estimation to identify the closest adversarial example (on boundary) from the input image.
To illustrate the effectiveness of the RED-Attack, we evaluate it for the CIFAR-10 and the GTSR dataset using state-of-the-art networks. The comparative analysis of results shows that the proposed approach successfully generates adversarial examples with much less perceptibility as compared to Decision-based Attack. We empirically show that, on average, the perturbation norm of adversarial images from their corresponding source images is decreased by 96.1 %, while their SSI and CC with respect to the corresponding clean images is increased by 71.7 % and 203 %, respectively.
2 Proposed Methodology
The proposed resource-efficient decision based attack methodology consists of two major steps: classification boundary estimation and optimization of attack image with respect to maximum allowed perturbations .
2.1 Boundary Estimation
The first step is to estimate the boundary for which we propose a half-interval search based algorithm that requires a target image and a reference image from any other class, as shown in Algorithm 1 (See Step 1 of the Fig. 6).
Let , and be the source image (class: A), reference image (class: other than A) and maximum allowed estimation error. The goal of this algorithm is to find a sample which has tolerable distance (less than ) from the classification boundary and has label different from that of the source image, mathematically it can be defined as:
To generate the appropriate , the proposed algorithm perform the half-interval search using the source image () and the reference image (). It first finds the half way point between and by computing the average of the two and then replaces or with depending upon the class in which falls. For examples, if the label of the half way point is class A then algorithm replaces with and if its label is not A then the algorithm replaces with . The algorithm repeats this process until the maximum distance of from the is less than and while ensuring the . The proposed boundary estimation can be used for targeted attack if we choose the reference image from the target class.
2.2 Optimization of Attack Image
The second step of the proposed methodology is to optimize noise on the sample (output of the boundary estimation). We propose to incorporate the adaptive update in the zeroth-order stochastic algorithm to efficiently optimize it.
Let , , and be the source image (class: A), reference image (class: other than A), maximum allowed estimation error and perturbed image, respectively. The goal of this algorithm is to minimize distance of from while ensuring that it has label different than the source image, mathematically it can be defined as:
To achieve this goal, we first identify the sign of the gradient for estimating the boundary behavior (minima). For example, if the sign is positive then we continue moving in the same direction; otherwise we switch the direction, as shown in Algorithm 2. Once the direction is identified, the next key challenge is to select the appropriate hop size. Therefore, to address this challenge, we propose an algorithm (Algorithm 3) which introduces adaptive hop size by applying the half-interval update. For example, for updating, it starts with maximum hop size and then it reduces it by half. The algorithm repeats itself until it finds the local minima, illustrated by Step 3-5 of Fig. 11.
3 Experimental Results and Discussions
3.1 Experimental Setup
To demonstrate the effectiveness of the proposed RED-Attack, we evaluated several un-targetd attacks on CIFAR10 and GTSR for state-of-the-art networks (Fig. 7).
3.2 Experimental Analysis
The imperceptibility of the adversarial image improves iteratively by minimizing its distance from the source image, as shown in Fig. 8. Form the analysis of this figure, we identify the following key observation: The adversarial image generated in first few iterations are not imperceptible but not even recognizable but over time with help of query and optimization algorithm it achieves the imperceptibility. Thus, in black-box attack, if we limit the number of queries then its imperceptibility decreases drastically.
3.3 Effects of Hyper-parameters on perceptibility (d)
We measure the distance of adversarial image from its source source image using the L2-Norm of perturbation which is defined as the pixel-wise sum of . We monitor the effect of changing three different hyper-parameters, introduced in our attacks, on the distance of the adversarial examples from their corresponding source examples.
3.3.1 Experimental Setup
- measures the maximum tolerable error, in terms of each pixel value, while computing the boundary point in Algorithm 1. Unless, otherwise stated, the typical value we use for is 1.
- defines the number of pixels, randomly selected to be perturbed in each iteration in order to estimate the gradient of the distance of the adversarial example from the source example (Algorithm 2). Unless otherwise stated, the typical value we use for is 5.
- defines the magnitude of noise added in each of the randomly selected pixels relative to the maximum value, a pixel can have. Unless otherwise stated, the typical value we use for is 5.
3.3.2 Experimental Analysis
As increases, the quality of the adversarial example at a given query count decreases, due to the increase in its distance from the source example. This is because the larger value of results in an imprecise boundary point, which in turn may result in an incorrect value of the estimated gradient, as shown in Figure 11. However, smaller value of results in a gradient direction, closer to the required direction (see Figure 11b).
A larger value of , initially results in a faster convergence. The reason is that we only need to estimate an overall trend of the boundary at initial stages. Estimating the update direction for the adversarial example by perturbing a large number of values at once helps achieve better results. However, Large is highly vulnerable to divergence as the attack progresses. This observation suggests that the attack can be significantly improved by changing the number of pixels perturbed, as the algorithm progresses in an adaptive manner.
Similar trend is observed with the changes in because estimating the gradients by introducing large perturbations in the image give an overall trend of the boundary instead of more precise localized gradients, as given by small perturbations. This in turn helps the algorithm to initially converge faster. However, small values of give a more stable convergence towards the solution.
3.4 German Traffic Sign Data set
In order to find out the generality of the trends shown in Figure 10, we perform a similar analysis on GTSR. In this experiment, we randomly select any image from the test data and generate adversarial example for fixed number of queries. We compute perturbation norm (d) from their corresponding source images and compare different approaches based on these findings.
3.4.1 Key Observations and Insights
Generally, the effect of changing hyper-parameters on the perturbation norm of the adversarial example is almost similar for GTSRB dataset and CIFAR-10.
We observe that the adversarial examples for untargeted attack against GTSRB classifier converge much faster as compared to the CIFAR-10 dataset. We attribute this to much larger number of classes in GTSRB dataset as compared to CIFAR-10 dataset.
As was observed in case of CIFAR-10, the attack can significantly be improved by adaptively changing the hyper-parameters as the attack progresses.
4 Comparison with the state-of-the-art
We compare our results with the state-of-the art Decision-based Attack. We use the implementation provided in an open source benchmark library, FoolBox (Rauber et al., 2017). We limit the maximum number of queries to 1000 and evaluate our attack for different values of , and . To compare our results with the Decision-based Attack, we use three different metrics i.e. correlation coefficient (CC), the squared L2-Norm (Pert. Norm) and the Structural Similarity Index (SSI) of the adversarial image with respect to the source image. Our results are shown in Figure 14.
It can be seen from the figure that the adversarial examples produced by RED-Attack in this settings are significantly superior to those produced by the Decision Attack. The reason is binary stepping while searching for the boundary point and efficient update process while computing a new instance of adversarial example.
Here, we want to emphasize that in the long run, the decision-based attack eventually surpasses the RED-Attack e.g. if the query efficiency is not much of a concern or the number of maximum queries is limited to instead of , the adversarial examples found by the decision-based attacks are better than those found by the RED-Attack.
In this paper, we proposed a novel Resource-Efficient Decision-based Imperceptible Attack (RED-Attack). It utilizes a half-interval search-based algorithm to estimate the classification boundary and an efficient update mechanism to boost the convergence of an adversarial example for decision-based attacks, in query limited settings. To illustrate the effectiveness of the RED-Attack, we evaluated it for the CIFAR-10 and the GTSRB dataset using multiple state-of-the-art networks. We limited the maximum number of queries to 1000 and showed that the state-of-the-art decision-based attack is unable to find an imperceptible adversarial example, while the RED-Attack arrives at a sufficiently imperceptible adversarial example within the predefined number of queries. Further, we empirically showed that, on average, the perturbation norm of adversarial images (from their corresponding source images) decreased by 96.1 %, while their Structural Similarity Index and Correlation (with respect to the corresponding clean images) increased by 71.7 % and 203 %, respectively.
- Alzantot et al. (2018) Alzantot, M., Sharma, Y., Chakraborty, S., and Srivastava, M. B. Genattack: Practical black-box attacks with gradient-free optimization. CoRR, abs/1805.11090, 2018. URL http://arxiv.org/abs/1805.11090.
Biggio & Roli (2018)
Biggio, B. and Roli, F.
Wild patterns: Ten years after the rise of adversarial machine learning.Pattern Recognition, 84:317–331, 2018.
- Brendel et al. (2017) Brendel, W., Rauber, J., and Bethge, M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. CoRR, abs/1712.04248, 2017. URL http://arxiv.org/abs/1712.04248.
- Carlini & Wagner (2017) Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
- Chen et al. (2018) Chen, B., Carvalho, W., Baracaldo, N., Ludwig, H., Edwards, B., Lee, T., Molloy, I., and Srivastava, B. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728, 2018.
Chen et al. (2017)
Chen, P., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.
ZOO: zeroth order optimization based black-box attacks to deep
neural networks without training substitute models.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2017, Dallas, TX, USA, November 3, 2017, pp. 15–26, 2017. doi: 10.1145/3128572.3140448. URL https://doi.org/10.1145/3128572.3140448.
- Goodfellow et al. (2014) Goodfellow et al., I. J. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Gu et al. (2017) Gu, T., Dolan-Gavitt, B., and Garg, S. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
- Khalid et al. (2018) Khalid et al., F. Fademl: Understanding the impact of pre-processing noise filtering on adversarial machine learning. CoRR, 2018. URL http://arxiv.org/abs/1811.01444.
- Kurakin et al. (2016) Kurakin et al., A. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
- Li et al. (2018) Li, W., Yu, J., Ning, X., Wang, P., Wei, Q., Wang, Y., and Yang, H. Hu-fu: Hardware and software collaborative attack framework against neural networks. arXiv preprint arXiv:1805.05098, 2018.
- Papernot et al. (2015) Papernot et al., N. The limitations of deep learning in adversarial settings. CoRR, abs/1511.07528, 2015.
- Rauber et al. (2017) Rauber, J., Brendel, W., and Bethge, M. Foolbox v0.8.0: A python toolbox to benchmark the robustness of machine learning models. CoRR, 2017. URL http://arxiv.org/abs/1707.04131.
- Stilgoe (2018) Stilgoe, J. Machine learning, social learning and the governance of self-driving cars. Social studies of science, 48(1):25–56, 2018.
- Zou et al. (2018) Zou, M., Shi, Y., Wang, C., Li, F., Song, W., and Wang, Y. Potrojan: powerful neural-level trojan designs in deep learning models. arXiv preprint arXiv:1802.03043, 2018.