1 Introduction
Deep Neural Networks (DNNs) have revolutionized the computing paradigms but due to their inherent security vulnerabilities, i.e., training or inference data poisoning (Khalid et al., 2018), backdoors (Biggio & Roli, 2018; Gu et al., 2017), Trojans (Li et al., 2018; Zou et al., 2018; Chen et al., 2018), etc., they can lead to catastrophic effects in safetycritical applications, e.g., in autonomous driving (Stilgoe, 2018). Several security attacks have been proposed that exploit these security vulnerabilities, however, the adversarial attacks have emerged as one of the most common and successful class of security attacks against DNNs and can be defined as the carefullycrafted imperceptible data corruptions to fool DNNs for misclassification (Goodfellow et al., 2014). The implementation and effectiveness of these attacks depend upon the underlying assumption for attacker’s access to the DNNs, i.e., whitebox and blackbox scenarios.
WhiteBox Attacks: Most of the stateoftheart attacks assume the whitebox scenario in which the adversary has full knowledge of the DNN architecture and the corresponding parameters (weights and biases), e.g., Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014), iterative FGSM (IFGSM) (Kurakin et al., 2016), Jacobian Saliency Map Attack (JSMA) (Papernot et al., 2015), CarliniWagner (CW) Attack (Carlini & Wagner, 2017), OnePixel Attack, Universal Adversarial Perturbation, DeepFool, PGDAttack, etc. Although these attacks can generate imperceptible adversarial examples efficiently, the underlying assumption of access to the complete DNN model is impractical in most of the scenarios.
BlackBox Attacks: Unlike whitebox attacks, blackbox attacks are not based on the aforementioned assumption and, therefore, limits the capability of the adversary because of the limited knowledge about the target system. Based on the available information in blackbox scenarios, following can be the three possible threat models:

Threat Model I: Adversary has access to the output probability vector/distribution while does not have access to the model parameters, as shown in Fig. 1(a).

Threat Model II:
Adversary has access only to the probability of the most probable class detected by the DNN classifier (in case of a classification problem), as shown in Fig.
1(b). 
Threat Model III: Adversary has access only to the final output of the system, i.e., the final class label in case of the a classification, as shown in Fig. 1(c).
Most of the stateoftheart blackbox attacks assume threat models I or II, e.g., ZerothOrder Optimizationbased Attack (Chen et al., 2017) and GenAttack (Alzantot et al., 2018). However, these attacks can be mitigated by concealing the information about the classification probability, i.e., by considering Threat Model III (see Fig. 1(c)). To address this limitation, decisionbased attack (Brendel et al., 2017) has been proposed which utilizes the random search algorithm to estimate the classification boundary. Though decisionbased attack can nullify the probability concealingbased defense, it possess the following limitations:

It requires multiple reference samples to generate a single adversarial example and thereby requires high memory/bandwidth resources.

The technique requires a large number of inferences/queries to generate a single adversarial example, for example, on average around 10000 for the examples illustrated in Fig. 2.

In case of restricted number of allowed iterations (where each iteration can have multiple queries), the performance of the decisionbased attack can reduces significantly, as illustrated in Fig. 3)
Associated Research Challenge: Based on the abovementioned limitations, we can safely conclude that decisionbased attack cannot be applied to a resource and energy constraint systems, e.g., autonomous vehicles. Thus it raises a key research questions: How to reduce the number of queries for generating an adversarial example while maintaining the imperceptibility (maximizing the correlation coefficient and structural similarity index)?
1.1 Novel Contributions
To address the abovementioned research challenge, in this paper, we propose ResourceEfficient Decisionbased methodology to generate an imperceptible adversarial attack under the threat model III. The underlying assumption for the proposed methodology is that it takes the preprocessed image and its corresponding class label from the blackbox model. Then it iteratively generates the adversarial image multiple time, computes the corresponding classification label for each iteration and optimizes it by finding the closet adversarial image (on classification boundary) from input image, as shown in Fig. 4. In a nutshell, this paper make the following contributions:

To reduce the number of queries, we propose a single reference samplebased halfinterval search algorithm to estimate the classification boundary.

To maximize the imperceptibility, we propose an optimization algorithm which combines the halfinterval search algorithm with gradient estimation to identify the closest adversarial example (on boundary) from the input image.
To illustrate the effectiveness of the REDAttack, we evaluate it for the CIFAR10 and the GTSR dataset using stateoftheart networks. The comparative analysis of results shows that the proposed approach successfully generates adversarial examples with much less perceptibility as compared to Decisionbased Attack. We empirically show that, on average, the perturbation norm of adversarial images from their corresponding source images is decreased by 96.1 %, while their SSI and CC with respect to the corresponding clean images is increased by 71.7 % and 203 %, respectively.
2 Proposed Methodology
The proposed resourceefficient decision based attack methodology consists of two major steps: classification boundary estimation and optimization of attack image with respect to maximum allowed perturbations .
2.1 Boundary Estimation
The first step is to estimate the boundary for which we propose a halfinterval search based algorithm that requires a target image and a reference image from any other class, as shown in Algorithm 1 (See Step 1 of the Fig. 6).
Definition 1
Let , and be the source image (class: A), reference image (class: other than A) and maximum allowed estimation error. The goal of this algorithm is to find a sample which has tolerable distance (less than ) from the classification boundary and has label different from that of the source image, mathematically it can be defined as:
To generate the appropriate , the proposed algorithm perform the halfinterval search using the source image () and the reference image (). It first finds the half way point between and by computing the average of the two and then replaces or with depending upon the class in which falls. For examples, if the label of the half way point is class A then algorithm replaces with and if its label is not A then the algorithm replaces with . The algorithm repeats this process until the maximum distance of from the is less than and while ensuring the . The proposed boundary estimation can be used for targeted attack if we choose the reference image from the target class.
2.2 Optimization of Attack Image
The second step of the proposed methodology is to optimize noise on the sample (output of the boundary estimation). We propose to incorporate the adaptive update in the zerothorder stochastic algorithm to efficiently optimize it.
Definition 2
Let , , and be the source image (class: A), reference image (class: other than A), maximum allowed estimation error and perturbed image, respectively. The goal of this algorithm is to minimize distance of from while ensuring that it has label different than the source image, mathematically it can be defined as:
To achieve this goal, we first identify the sign of the gradient for estimating the boundary behavior (minima). For example, if the sign is positive then we continue moving in the same direction; otherwise we switch the direction, as shown in Algorithm 2. Once the direction is identified, the next key challenge is to select the appropriate hop size. Therefore, to address this challenge, we propose an algorithm (Algorithm 3) which introduces adaptive hop size by applying the halfinterval update. For example, for updating, it starts with maximum hop size and then it reduces it by half. The algorithm repeats itself until it finds the local minima, illustrated by Step 35 of Fig. 11.
3 Experimental Results and Discussions
3.1 Experimental Setup
To demonstrate the effectiveness of the proposed REDAttack, we evaluated several untargetd attacks on CIFAR10 and GTSR for stateoftheart networks (Fig. 7).
3.2 Experimental Analysis
The imperceptibility of the adversarial image improves iteratively by minimizing its distance from the source image, as shown in Fig. 8. Form the analysis of this figure, we identify the following key observation: The adversarial image generated in first few iterations are not imperceptible but not even recognizable but over time with help of query and optimization algorithm it achieves the imperceptibility. Thus, in blackbox attack, if we limit the number of queries then its imperceptibility decreases drastically.
3.3 Effects of Hyperparameters on perceptibility (d)
We measure the distance of adversarial image from its source source image using the L2Norm of perturbation which is defined as the pixelwise sum of . We monitor the effect of changing three different hyperparameters, introduced in our attacks, on the distance of the adversarial examples from their corresponding source examples.
3.3.1 Experimental Setup

[leftmargin=*]

 measures the maximum tolerable error, in terms of each pixel value, while computing the boundary point in Algorithm 1. Unless, otherwise stated, the typical value we use for is 1.

 defines the number of pixels, randomly selected to be perturbed in each iteration in order to estimate the gradient of the distance of the adversarial example from the source example (Algorithm 2). Unless otherwise stated, the typical value we use for is 5.

 defines the magnitude of noise added in each of the randomly selected pixels relative to the maximum value, a pixel can have. Unless otherwise stated, the typical value we use for is 5.
3.3.2 Experimental Analysis
Figures 9 and 10 shows the source examples that we use for targeted and untargeted attacks along with the initial target examples and the generated adversarial examples.

[leftmargin=*]

As increases, the quality of the adversarial example at a given query count decreases, due to the increase in its distance from the source example. This is because the larger value of results in an imprecise boundary point, which in turn may result in an incorrect value of the estimated gradient, as shown in Figure 11. However, smaller value of results in a gradient direction, closer to the required direction (see Figure 11b).

A larger value of , initially results in a faster convergence. The reason is that we only need to estimate an overall trend of the boundary at initial stages. Estimating the update direction for the adversarial example by perturbing a large number of values at once helps achieve better results. However, Large is highly vulnerable to divergence as the attack progresses. This observation suggests that the attack can be significantly improved by changing the number of pixels perturbed, as the algorithm progresses in an adaptive manner.

Similar trend is observed with the changes in because estimating the gradients by introducing large perturbations in the image give an overall trend of the boundary instead of more precise localized gradients, as given by small perturbations. This in turn helps the algorithm to initially converge faster. However, small values of give a more stable convergence towards the solution.
3.4 German Traffic Sign Data set
In order to find out the generality of the trends shown in Figure 10, we perform a similar analysis on GTSR. In this experiment, we randomly select any image from the test data and generate adversarial example for fixed number of queries. We compute perturbation norm (d) from their corresponding source images and compare different approaches based on these findings.
The results are shown in Figure 12. The complete set of adversarial examples along with the corresponding source images is shown in Figure 13.
3.4.1 Key Observations and Insights

[leftmargin=*]

Generally, the effect of changing hyperparameters on the perturbation norm of the adversarial example is almost similar for GTSRB dataset and CIFAR10.

We observe that the adversarial examples for untargeted attack against GTSRB classifier converge much faster as compared to the CIFAR10 dataset. We attribute this to much larger number of classes in GTSRB dataset as compared to CIFAR10 dataset.

As was observed in case of CIFAR10, the attack can significantly be improved by adaptively changing the hyperparameters as the attack progresses.
4 Comparison with the stateoftheart
We compare our results with the stateofthe art Decisionbased Attack. We use the implementation provided in an open source benchmark library, FoolBox (Rauber et al., 2017). We limit the maximum number of queries to 1000 and evaluate our attack for different values of , and . To compare our results with the Decisionbased Attack, we use three different metrics i.e. correlation coefficient (CC), the squared L2Norm (Pert. Norm) and the Structural Similarity Index (SSI) of the adversarial image with respect to the source image. Our results are shown in Figure 14.
It can be seen from the figure that the adversarial examples produced by REDAttack in this settings are significantly superior to those produced by the Decision Attack. The reason is binary stepping while searching for the boundary point and efficient update process while computing a new instance of adversarial example.
Here, we want to emphasize that in the long run, the decisionbased attack eventually surpasses the REDAttack e.g. if the query efficiency is not much of a concern or the number of maximum queries is limited to instead of , the adversarial examples found by the decisionbased attacks are better than those found by the REDAttack.
5 Conclusion
In this paper, we proposed a novel ResourceEfficient Decisionbased Imperceptible Attack (REDAttack). It utilizes a halfinterval searchbased algorithm to estimate the classification boundary and an efficient update mechanism to boost the convergence of an adversarial example for decisionbased attacks, in query limited settings. To illustrate the effectiveness of the REDAttack, we evaluated it for the CIFAR10 and the GTSRB dataset using multiple stateoftheart networks. We limited the maximum number of queries to 1000 and showed that the stateoftheart decisionbased attack is unable to find an imperceptible adversarial example, while the REDAttack arrives at a sufficiently imperceptible adversarial example within the predefined number of queries. Further, we empirically showed that, on average, the perturbation norm of adversarial images (from their corresponding source images) decreased by 96.1 %, while their Structural Similarity Index and Correlation (with respect to the corresponding clean images) increased by 71.7 % and 203 %, respectively.
References
 Alzantot et al. (2018) Alzantot, M., Sharma, Y., Chakraborty, S., and Srivastava, M. B. Genattack: Practical blackbox attacks with gradientfree optimization. CoRR, abs/1805.11090, 2018. URL http://arxiv.org/abs/1805.11090.

Biggio & Roli (2018)
Biggio, B. and Roli, F.
Wild patterns: Ten years after the rise of adversarial machine learning.
Pattern Recognition, 84:317–331, 2018.  Brendel et al. (2017) Brendel, W., Rauber, J., and Bethge, M. Decisionbased adversarial attacks: Reliable attacks against blackbox machine learning models. CoRR, abs/1712.04248, 2017. URL http://arxiv.org/abs/1712.04248.
 Carlini & Wagner (2017) Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
 Chen et al. (2018) Chen, B., Carvalho, W., Baracaldo, N., Ludwig, H., Edwards, B., Lee, T., Molloy, I., and Srivastava, B. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728, 2018.

Chen et al. (2017)
Chen, P., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.
ZOO: zeroth order optimization based blackbox attacks to deep
neural networks without training substitute models.
In
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2017, Dallas, TX, USA, November 3, 2017
, pp. 15–26, 2017. doi: 10.1145/3128572.3140448. URL https://doi.org/10.1145/3128572.3140448.  Goodfellow et al. (2014) Goodfellow et al., I. J. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
 Gu et al. (2017) Gu, T., DolanGavitt, B., and Garg, S. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
 Khalid et al. (2018) Khalid et al., F. Fademl: Understanding the impact of preprocessing noise filtering on adversarial machine learning. CoRR, 2018. URL http://arxiv.org/abs/1811.01444.
 Kurakin et al. (2016) Kurakin et al., A. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
 Li et al. (2018) Li, W., Yu, J., Ning, X., Wang, P., Wei, Q., Wang, Y., and Yang, H. Hufu: Hardware and software collaborative attack framework against neural networks. arXiv preprint arXiv:1805.05098, 2018.
 Papernot et al. (2015) Papernot et al., N. The limitations of deep learning in adversarial settings. CoRR, abs/1511.07528, 2015.
 Rauber et al. (2017) Rauber, J., Brendel, W., and Bethge, M. Foolbox v0.8.0: A python toolbox to benchmark the robustness of machine learning models. CoRR, 2017. URL http://arxiv.org/abs/1707.04131.
 Stilgoe (2018) Stilgoe, J. Machine learning, social learning and the governance of selfdriving cars. Social studies of science, 48(1):25–56, 2018.
 Zou et al. (2018) Zou, M., Shi, Y., Wang, C., Li, F., Song, W., and Wang, Y. Potrojan: powerful neurallevel trojan designs in deep learning models. arXiv preprint arXiv:1802.03043, 2018.
Comments
There are no comments yet.