It has been widely shown that current machine learning models (include deep neural networks) suffer from attack and vulnerable to adversarial examples(Goodfellow et al., 2014; Szegedy et al., 2013; Chen et al., 2017a). Researchers have developed methods (Goodfellow et al., 2014; Moosavi-Dezfooli et al., 2016; Carlini & Wagner, 2017; Chen et al., 2017b)
to generate such examples that can mislead even the state-of-the-art models. These methods mainly use gradient on the loss function of the model’s output, and perform back-propagation on the input which will change the model’s output most quickly. Such methods are effective to generate adversarial examples with low perturbation on the original sample. However, a significant shortcoming of these gradient-based attacks is that it’s very likely to converge in a local optimal value. Recent work(Wang et al., 2018) show that by different initialization, such as different starting point in white-box setting or different direction in black-box setting, the final convergence value can be very different. However, these methods introduce lots of extra computation cost and their improvement is relatively minor.
In this paper, we argue that gradient-based method is severely lack of variance and can significantly benefit from introducing more diversity. Specifically, we mainly try to boost the performance of Sign-OPT attack(Cheng et al., 2018) in hard-label black-box setting, since the goal is to find the best direction that has minimum distortion, the starting direction is very important. We later generalize our method on the white-box attack such as C&W attack and also boost its performance. Generally speaking, we initialize the attack by different configurations and apply currently best attack algorithms on them, to reduce computation cost and introduce more guided variance, we continuely cut the worse part of configuration and resample new configurations using Bayesian Optimization. In white-box attack, we randomly sample points within an ball and perform PGD (Kurakin et al., 2016) attack on them, in hard-label black-box attack (Brendel et al., 2017; Cheng et al., 2018), we first sample directions and apply Sign-OPT attack on them. In certain iterations, we will conduct successive halving (Jamieson & Talwalkar, 2016)
which will abandon the worst half of these configurations to reduce computation cost, and use a Bayesian Optimization method called Tree Parzen Estimator (TPE)(Bergstra et al., 2011) to resample from search space, this procedure will encourage the algorithm to search for more promising areas and introducing guided variance in the middle step. By adopting these methods, we can enhance the performance of current attack without increasing too much computation.
Our contribution are summarized below:
We conduct thorough experiments to show that current gradient-based attack algorithms often converge to local minima or maxima and can not find a good adversarial example, thus require further improvement.
We design a general algorithm to boost the performance of current attack algorithms and encourage them to find a more global optimal value, which can generate better adversarial examples.
We conduct comprehensive experiments on several datasets and attack algorithms. We show that our method can help the current algorithm to find better examples with 10%-20% lower distortion. By the introduction of cutting mechanism, we can reduce the computation cost 5x-10x compared to non-cutted search.
2 Background and Related work
2.1 White-box attack
First attack attempt conducts experiments in a white-box setting which is relatively easy, it means that the attacker has free access to the model’s architecture and parameters. FGSM (Goodfellow et al., 2014) is one of the first algorithm in this setting, It performs single back-propagation step on the gradient and then uses it to generate a adversarial example. Madry et al. (2017) and Kurakin et al. (2016) further improve the FSGM method by turning it into a iterativem way called Projected Gradient Decent (PGD), that is, at each iteration the model only walk through the calculated gradient for a small step, then re-calculate the gradient and walk again. The original white-box attacks calculate gradient on the softmax output and try to maximize the error of the original label, while C&W attack (Carlini & Wagner, 2017) tries to minimize the distance between the original image and the attack image by metric with an norm, and simultaneously maximize the error of model’s prediction for the original label.
All of these white-box attack methods highly depend on the gradient to find direction that will make the model generates wrong predictions. However, Wang et al. (2018, 2019) show that starting from different points around the orinigal one, these points will converge to very different local optimals. The author further uses interval attack to find a series of starting point which increases the diversity and apply PGD attack on it. Th result shows that compared to the original PGD attack starts from difference points, the interval attack will find local optimals with final distortion spread much wider, and by increasing the number of starting points, it can find a better optimal than PGD.
2.2 Black-box attack
In black-box setting, the attacker does not have no direct access to the model’s parameter or architecture, that is, the model remains a black box to the attacker. So the attacker has to only depend on the output of the model to conduct attack. Depending on how much information the model will give to the outside world, the problem is subsequently splited into two parts: soft-label and hard-label. In the soft label setting, after the model accepted an input, it will output the probability of each label, while in the hard label setting, the model will only output the top-1 label with greatest probability. For soft label attack,Chen et al. (2017b) were first to use Zero Order Optimization(ZOO) to approximately estimate the gradient and perform PGD–based adversarial attack on it. In Ilyas et al. (2018a), the author uses Neural evolution strategy (NES) to estimate the gradient. As for the hard-label attack, Brendel et al. (2017) first formulate this problem and use boundary attack to find the adversarial example with lowest minimum distortion. Later, Cheng et al. (2018)(OPT attack) describes the hard-label attack as a continuous form and uses ZOO to search for the best direction. Cheng et al. (2018)(Sign-OPT) further reduce the queries by only calculating the sign of ZOO updating function. These methods still uses gradient as the signal to find the best adversarial examples with minimum distortion, and will of course suffer from local optimal. In experiment we find that by starting from different directions, the final converging results can be very different.
2.3 Bayesian Optimization and Successive Halving
Bayesian Optimization (BO) has been successfully applied to optimize a function which is un-differentiable or black-box like finding the hyper-parameters of neural networks in AutoML area. It mainly adopts the idea to sample new points based on the past knowledge. Basically, Bayesian optimization finds the optimal value of a given function in a iterative manner: at each iteration i, BO uses a probabilistic model to estimate and approach the unknown function based on the data points that are already observed by the last iterations. Specifically, it samples new data points where is the acquisition function and are the samples queried from so far. The most widely used acquisition functions is the expected improvement (EI):
Where is the value of the best sample generated so far and is the location of that sample, i.e. .
The Tree Parzen Estimator (TPE). TPE (Bergstra et al., 2011)
is a Bayesian Optimization method proposed to solve the hyper-parameter tuning problems that uses a kernel density estimator (KDE) to approximate the distribution ofinstead of trying to model the objective function directly. Specifically, it models the and instead of , and define using two separate KDE and :
where is a constant between the lowest and largest value of in . (Bergstra et al., 2011) shows that maximizing the radio is equivalent to optimizing the EI function described in Equation 1 (see Appendix B for more detail). In such setting, the computational cost of generating a new data point by KDE grows linearly with the number of data points already generated, while traditional Gaussian Process (GP) will require cubic-time.
Successive Halving. The idea behind Successive Halving (Jamieson & Talwalkar, 2016) can be easily illustrated by it’s name: first initialize a set of configurations and perform some calculations on them, then evaluate the performance of all configurations and discard the worst half od these configurations, this process continues until there is only one configuration left. BOHB (Falkner et al., 2018) combines HyperBand (derived from Successive Halving) (Li et al., 2016) and TPE to solve the AutoML problem and achieve greate success.
However, these methods are originally applied to the hyper-parameter tuning problem where the parameters need to be searched are not too much(approximately 10-20), it will suffer from ”dimensional curse” when the number of parameters grows larger and the computation cost needed will be unacceptable. There are already some work (Moriconi et al., 2019; Wang et al., 2013) try to use BO in high dimension, while we still found in experiment that simply use BO can not converge as good as gradient-based methods.
3.1 Basic Intuition
Due to the high dimension of image classification problem, current adversarial attack method primarily adopt the gradient-based algorithm to find a adversarial sample. These methods are efficient always much quicker to to find a successful attack sample compared to probabilistic-based methods like Bayesian Optimization. However, gradient-based algorithm can easily stuck in a local optimal value and its final result highly depends on the starting direction, this is because at each iteration, it always takes a step through gradient which assures to reach a better point than before. Such methods lack of variance and make it almost impossible to find a global optimal value, especially when the searching space is high dimensional and non-convex and may contains lots of local optimal values.
To better illustrate this phenomenon, Figure 1
shows the results of an attack at an image from MNIST dataset by 784 start directions(equal to the dimension of MNIST dataset), the attack is done by Sign-OPT which is the current state-of-the-art algorithm in hard-label black-box attack. To maximize the cosine difference between different start directions, we use Gram–Schmidt Process to force the directions to be orthogonal to each other, the MNIST dataset have 784 dimension (28*28) so the diversity should be enough. We can see that the best and worse curves do converge in a quite different value, suggesting that tuning the starting direction and introducing additional variance is necessary in the Sign-OPT method.
3.2 A General Boosting Mechanism
The general goal is to efficiently find a better adversarial example based on current attack algorithms without introducing too much computation cost. Normally, there are two commonly used methods to solve a black-box optimization problem: gradient-based and probabilistic-based algorithm. Gradient-based methods which are commonly used in black-box attack estimate the gradient of the objective function and go through the gradient iteratively until convergence, while probabilistic-based algorithm such as Bayesian Optimization (BO) (Pelikan et al., 1999; Snoek et al., 2012) try to approximate the objective function by a probabilistic model, it first sample uniformly across the search space and incorporates prior belief about the objective function, then try to query new points that most likely to be a better one based on the observation of past steps. Generally speaking, gradient-based methods converge fast but may stuck in some local optimal directions. Probabilistic-based algorithm can have a better chance to find more global optimal values, however, the computation cost grows exponentially while the dimension increases and quickly become unacceptable.
We argue that the gradient-based methods’ performance will naturally depends on the starting directions. As show in Figure 2, starting from different points like and , gradient-based methods will result in different local minimas. The natural way to tackle this problem is to try more starting directions, but that will add lots of computation cost and less efficient. So our general goal is to efficiently find better optimal values, like drop the un-promising configurations and adopting guided search techniques.
Our algorithm to boost the performance of current attack methods is presented in Algorithm 1. In the searching phase, we maintain two sets to record the information: step pool records all iterations performed including the cutted one, and available search pool stores all configurations that are still active and need to be searched later. The reason that we record all the iterations taken is that to fit a KDE with high dimension and requires lots of data to fill the search space, also, it can reflect the changes between iterations and help the model to better understand the search space. We sample
directions using Gaussian or Uniform distribution as starting configurations, then for each interval, we first perform iterations on each of the configurations, and cut the worst percent of searching direction, in the meanwhile, we will use TPE resampling to add new possible directions to search.
The procedure of Tree Parzen Estimator resampling is to encourage the algorithm to find possible directions with lower distortion. The reasons and benefits to resample besides of cutting is:
Depends not only on the starting directions. First, this procedure will add more variance in the middle of searching steps, instead of depending solely on the starting directions. In experiment we find that start with the same direction, Sign-OPT often has similar final convergence distortion despite the procedure to estimate gradient also introduce some sort of random.
Guided random. The starting directions are sampled randomly with no information about the objective function, however, after obtained some awareness of the boundary by past searches, we can resample new directions guided by them, which will make the resample more efficient and more likely to find better directions.
Parameter Sharing. Past iterations performed by Sign-OPT already lead to relatively better local areas with small distortion, by sampling around these areas, we can regard this as sharing the past searches with prior ones, not starting searching from the original point, which can further reduce the computation cost.
As shown in Algorithm 2, we first divide the observed data into two parts by their distortion (the lower means the better), and formulate two separate KDE to fit these two subsets. Later we try to sample new data with the minimum value of which can be proved to have maximum relative gain on distortion ( see appendix B for more detail), since we can’t directly find such points, we sample for a few times (the number is set to 100 during the experiment) from and keep the one with the minimum .
3.3 Boosting Hard-Label Black-Box Attack
In black-box attack, we mainly study how to boost the performance in hard-label setting. We choose Sign-OPT attack Cheng et al. (2018) which is the state-of-the-art attack algorithm in this area as base algorithm, and enhance its performance by successive halving and TPE resampling. To tackle the problem that hard-label attack only have binary information describing each label as true or false, OPT attack transforms the hard-label attack as a continuous form and thus can apply gradient-based algorithm to optimize it. Specifically, it defines as the minimum distortion toward a direction and try to find that have smallest :
by calculating using binary search, the author transforms un-differentiable the hard-label attack problem into an continuous form, and use Zero Order Optimization (Chen et al., 2017b) to estimate the directional derivative of :
Sign-OPT further improves this method by just using a single query to calculate the sign of Equation 4. We argue that these method still highly depend on the estimated gradient and can lead to non-optimal local minimas. Thus can greatly benefit from adding more variance into these algorithms.
We try to apply our boosting algorithm to Sign-OPT attack and help enhance its performance. To adopt our boosting algorithm into Sign-OPT attack, the Critic metric is the distance between the original example and adversarial example, and lower distance means the better and promising configuration.
3.4 Boosting Hard-Label Black-Box Attack
Besides Sign-OPT attack, we demonstrate that our algorithm can also be applied to white-box attack, specifically, we try to boost the C&W attack (Carlini & Wagner, 2017) by adding more variance into it. We argue that C&W is also a gradient-based method and it’s performance will also restricted by local minimas. We describe the exact setting in Appendix B.
To evaluate the effectiveness and generalization ability of our algorithm in order to help current attack method to find better optimal value, we conduct experiments on both white-box and hard-label black box attack. We use several popular image classification dataset and test them on both white-box and black-box setting. Also, we try to enhance the performance of Gradient Boosting Decision Tree (GBDT), since the GBDT is naturally non-differentiable and can not apply current white-box attack on it, we only try black-box attack on GBDT.
4.1 Hard-Label Black-Box attack
4.1.1 Attack on image.
using the state-of-the-art attack algorithm Sign-OPT. The neural network model’s architecture is the same with one reported in Sign-OPT. In detail, both MNIST and CIFAR use the same network structure contains four convolution layers, two max-pooling layers and two fully-connected layers. As reported inCarlini & Wagner (2017) and Cheng et al. (2018), we finally achieve an accuracy of 99.5% on MNIST and 82.5% on CIFAR-10. As for the ImageNet dataset, We use the pretrained Resnet-50(He et al., 2016) network provided by torchvision (Marcel & Rodriguez, 2010), which achieves a Top-1 accuracy of 76.15%. We randomly select 100 examples from each dataset (in the test dataset) to evaluate.
We adopt Sign-OPT attack to be the base algorithm to be boosted, and also include the following two algorithms for comparison:
We study the effect of successive halving and TPE resampling separately. As shown in Figure 3, in successive halving, we continuing cutting the worse percent of searching direction during a specific interval until there is only one sample left. As for successive halving and TPE resampling combined, we do cutting and resampling in the first several phase, and only do cutting later until all searching directions are cut or the search reach local optimal and does not move again. We can see from Figure 3(b) that TPE resampling indeed find some directions that are better than original ones. We observed from experiments that the final best direction mostly comes from TPE resampling instead of original starting directions, which demonstrates the importance of introducing resampling in the middle of optimization steps. The quantitative influence of successive halving and TPE resampling are demonstrated in Appendix E.
A natural question will rise: How many starting points are enough? In order to find the best number of starting points, we do attack on an image with different number of starting directions, for a specific number of starting directions, we also run several times and average the result to reduce variance. Figure 4
shows the attack on MNIST image using Sign-OPT method, we can see the effect that the number of starting direction have on the final converging distortion. The increase the starting directions do not help reduce the minimum distortion much after about 50 directions. Also, We can find that the standard deviation is smaller and the final distortion is lower when resampling by TPE is introduced. This is probably because TPE resampling also introduce variance in the middle step and making the algorithm not completely depends on the starting directions, this will also helps increase the probability to find a better optimal value .
Another question is: What is the best cutting interval?The cutting interval decides how many iterations are applied before the next cutting and resampling is conducted. It is an important factor to be tuned. If it is to small, the configuration will be evaluated and possibly cutted before they relatively converge, which makes the cutting unreasonable and inaccurate. If it is too large, then the configurations that are not promising will be performed for more iterations which is a waste of computation. Li et al. (2016) develop an algorithm called HyperBand to search for the best searching interval and cutting rate. However, HyperBand is used to solve the hyper-parameter tuning in AutoML which is different from our problem. In the hyper-parameter tuning problem, for a specific dataset only one best optimal setting exists and different datasets can have very different best settings for searching interval and cutting rate. However, in our experiment we will have thousands of images to be attacked, and the images in the same dataset always share the same best searching interval and cutting rate. As a result, we artificially find the best setting during the experiment for each dataset, instead of searching them like HyperBand. This will reduce lots of unnecessary computations. The parameters for different datasets are shown in Appendix F.
4.1.2 Attack on Gradient Boosting Decision Tree (GBDT).
We conduct untargeted attack on gradient booting decision tree (GBDT). Since Sign-OPT does not include the experiment with GBDT, we use the OPT-based attack (Cheng et al., 2018) and apply our boosting algorithm on it. In this experiment, we also use MNIST (LeCun et al., 1998) dataset for multi-class classification, we select the popular GBDT framework LightGBM and use the parameters in https://github.com/Koziev/MNIST_Boosting. The parameters suggested in this repository achieve 98.09% accuracy for MNIST dataset. The attack settings are almost the same to the method described before. Specifically, we start from 30 directions and conduct successive halving per 2000 queries. The results of the GBDT attack on MNIST are shown in Table 3.
4.1.3 red(Cho: maybe add attack KNN experiments. )
4.2 White-Box attack
In the white-box attack, we adopt C&W attack method described in (Carlini & Wagner, 2017) and perform the attack on both MNIST and CIFAR-10 dataset. We select 100 examples for each dataset to evaluate the performance. In order to introduce variance and encourage the model to find better and global optimal values, we randomly sample 50 points inside the ball (we use during the experiment) as the starting points, and apply C&W attack on each of them. For simplicity, we fix described in Equation 6 to be 0.2, and it can easily be applied to different settings of . The cutting and resampling are used during the search and the detail parameters can be found in Appendix F. The results before and after boosting are shown in Table 3.
In this paper, we propose a general boosting framework that can be applied to both white-box and black-box attack to find a more global optimal adversarial sample without increasing too much computational cost. Our method enjoys the variance guaranteed by Bayesian Optimization and efficiency provided by Gradient-based Optimization, and will find a much better optimal value compared to previous work. We also prove experimentally that different starting directions do significantly effect the final attack distortion and studied the best number of directions to achieve both efficiency and optimum.
- Bergstra et al. (2011) James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper-parameter optimization. In Advances in neural information processing systems, pp. 2546–2554, 2011.
- Brendel et al. (2017) Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248, 2017.
- Carlini & Wagner (2017) Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
- Chen et al. (2017a) Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, and Cho-Jui Hsieh. Attacking visual language grounding with adversarial examples: A case study on neural image captioning. arXiv preprint arXiv:1712.02051, 2017a.
Chen et al. (2017b)
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh.
Zoo: Zeroth order optimization based black-box attacks to deep neural
networks without training substitute models.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM, 2017b.
- Cheng et al. (2018) Minhao Cheng, Thong Le, Pin-Yu Chen, Jinfeng Yi, Huan Zhang, and Cho-Jui Hsieh. Query-efficient hard-label black-box attack: An optimization-based approach. arXiv preprint arXiv:1807.04457, 2018.
- Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In
- Falkner et al. (2018) Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. arXiv preprint arXiv:1807.01774, 2018.
- Goodfellow et al. (2014) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Ilyas et al. (2018a) Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. arXiv preprint arXiv:1804.08598, 2018a.
- Ilyas et al. (2018b) Andrew Ilyas, Logan Engstrom, and Aleksander Madry. Prior convictions: Black-box adversarial attacks with bandits and priors. arXiv preprint arXiv:1807.07978, 2018b.
Jamieson & Talwalkar (2016)
Kevin Jamieson and Ameet Talwalkar.
Non-stochastic best arm identification and hyperparameter optimization.In Artificial Intelligence and Statistics, pp. 240–248, 2016.
- Krizhevsky et al. (2010) Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research). URL http://www. cs. toronto. edu/kriz/cifar. html, 8, 2010.
- Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
- LeCun et al. (1998) Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Li et al. (2016) Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv preprint arXiv:1603.06560, 2016.
- Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
Marcel & Rodriguez (2010)
Sébastien Marcel and Yann Rodriguez.
Torchvision the machine-vision package of torch.In Proceedings of the 18th ACM international conference on Multimedia, pp. 1485–1488. ACM, 2010.
- Moosavi-Dezfooli et al. (2016) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582, 2016.
- Moriconi et al. (2019) Riccardo Moriconi, Marc Peter Deisenroth, and K. S. Sesh Kumar. High-dimensional bayesian optimization using low-dimensional feature spaces. 2019.
Pelikan et al. (1999)
Martin Pelikan, David E Goldberg, and Erick Cantú-Paz.
Boa: The bayesian optimization algorithm.
Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume 1, pp. 525–532. Morgan Kaufmann Publishers Inc., 1999.
- Snoek et al. (2012) Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959, 2012.
- Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Wang et al. (2018) Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018.
- Wang et al. (2019) Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Enhancing gradient-based attacks with symbolic intervals. arXiv preprint arXiv:1906.02282, 2019.
- Wang et al. (2013) Ziyu Wang, Masrour Zoghi, Frank Hutter, David Matheson, and Nando De Freitas. Bayesian optimization in high dimensions via random embeddings. In Twenty-Third International Joint Conference on Artificial Intelligence, 2013.
Appendix A Introduction to adversarial attack
In a common attack setting, we are trying to find the weakness of a well trained machine learning models by generating some examples that can be correctly classified by human but not machine. Specifically, assume that we have a well-trained multi-class classification model, various adversarial attack algorithms try to find the such that:
There are several key features of a successful adversarial example that worth mention:
The new example should be near the original .
The output of the machine learning model changes.
Human can easily classify to its correct label since the change is minor.
The possible explaination that a well trained model will exhibit such phenomenon can be illustrated in Figure 5, assume we are attacking a classification model such image recognition, the problem is that the model’s prodiction of certain class may not be the true, we can see that the region with deeper color is the true area of a class, and two region with label a and b are actually far away, which is obvious. However, the model’s predicted region for these two classes may be different from true regions, and their boundary are close to each other, which makes the adversarial attack happens.
Appendix B Boosting White-Box Attack
Figure 6 shows a possible boundary distribution and C&W (Carlini & Wagner, 2017) attack performed on it. The decision boundary of a neural network can be very unsmooth and contains lots of local optimal points. Generally speaking, the cost from the original point toward the boundary(which means a successful attack) highly depends on the directions. Traditional white-box attack algorithms like FSGM (Goodfellow et al., 2014), PGD (Kurakin et al., 2016) and C&W (Carlini & Wagner, 2017) use walk through gradient to reach the boundary, implying that gradient can guide to the optimal value, which may not be the case. We try to improve PGD attack to find a global local optimal adversarial example by encouraging it to search not just depends the gradient. Assume that the original sample is (, ) and is the loss function of a neural network, C&W attack conducts iterative search as following:
which depend solely on the gradient. Instead of starting from a single direction which is calculated by gradient on the original point, we randomly sample points within an ball (which means the the distance between the generated point and the original one is less than , this value may be slightly different depends on datasets) as a set of possible configurations. Later, we assign a fixed budget, i.e. certain number of PGD iterations, to each of these configurations. We evaluate each of these configurations after they run out the budget and cut the worst percent of configurations and do not assign any budget to them further. We evaluate these configurations by comparing their on the same iterations, the one with higher will be presumed good. This procedure will continue until their is only one configuration left.
Besides successive halving, we also resample from the search space guided by past searches. We use TPE resampling algorithm described in Algorithm 2 to add new possible configurations and perform search on them. We will describe it more specifically in the next section. Noted that in the black-box setting, the good configurations are those have larger than , but in black-box setting, the configurations with lower than are good ones, because in black-box attack we are performing search on the boundary which already guarantee to have a successful attack, so the only goal is to find a lower distortion.
Appendix C Tree Parzen Estimator
The Expected Improvement can also be written as:
Assume that , then:
Which means maximizing is equivalent to maximize the EI function.
Appendix D Experiments on CIFAR-10 and ImageNet
In both MNIST and CIFAR-10 dataset, we keep the dimension unchanged during the attack. On ImageNet-1000 dataset, because the dimension is too high (224*224) and the probabilistic density function of KDE is so close to zero at each point, it will make the predicted probability underflow and cause numerical errors. To tackle this problem, as described in Ilyas et al. (2018b), the author suggests that images tend to have a spatially local similarity, for instance, pixels in a local region tend to be similar to each other. This phenomenon also applies to the gradients. More specifically, if two points are close, then the gradients of these two points will be relatively similar, this paper call it ”data-dependent priors”. This gives us the opportunity to reduce the dimension of ImageNet-1000 dataset to avoid the numerical problem in KDE. In the experiment, we reduce the dimension by apply a the mean pooling operation with kernel size , the is set to be 5 to balance precision and query efficiency. Table 1 shows the results by applying successive halving and TPE resampling (starting from 50 random directions) on Sign-OPT method to boost the performance, we can see that the performance gain is significant and the attack success rate (ASR) reach the new state-of-the-art in hard-label black-box attack setting.
Appendix E Adversarial examples
Appendix F Ablation Study
We study the influence of introducing cutting and resampling into our method. We conduct hard-label black-box attack on MNIST using Sign-OPT attack as base algorithm. For comparison, we use Multi-Directional attack as the baseline method. It samples 30 starting directions and perform attack on them without cutting and resampling during the middle step. We can see from Table 4 that introducing cutting in the middle step reduce the computation cost without harming the overall performance, and by introducing resampling, we can find better adversarial examples without increasing too much computation.
|Avg||Relative Gain||ASR()||Queries||Relative Loss|
|Successive Halving attack||0.99||5.7%||55%||257893||6.4x|
Appendix G parameters for different datasets
g.1 Black-Box Attack
g.2 White-Box Attack