## I Introduction

The success of deep neural networks (DNNs) [Krizhevsky, lenet, googlenet, vgg, resnet] has led to them being used in many real world applications. However, these models are also known to be susceptible to adversarial attacks, i.e., minimal patterns crafted by attackers who try to fool learning machines [Goodfellow, Papernotb, szegedy, Nguyena, Eykholt, athalye3d]. Such adversarial patterns do not affect human perception much, while they can manipulate learning machines, e.g., to give wrong classification outputs. DNN’s complex interactions between different layers enable high accuracy under the controlled setting, while they make the outputs unpredictable in*untrained spots*where training samples exist sparsely. If attackers can find such a spot close to a normal data sample, they can manipulate DNNs by adding a very small (optimally invisible in computer vision applications) perturbation to the original sample, leading to fatal errors, e.g., manipulating an autonomous driving system can cause serious accidents. Two attacking scenarios are considered in general—whitebox and blackbox. The whitebox scenario assumes that the attacker has access to the complete target system, including the architecture and the weights of the DNN, as well as the defense strategy if the system is equipped with any. Typical whitebox attacks optimize the classification output with respect to the input by backpropagating through the defended classifier [Carlinib, chen2017ead, sharma2017ead, moosavi2016deepfool]. On the other hand, the blackbox scenario assumes that the attacker has only access to the output. Under this scenario, the attacker has to rely on blackbox optimization, where the objective can be computed for arbitrary inputs, but the gradient information is not directly accessible. Although the whitebox attack is more powerful, it is much less likely that attackers can get full knowledge of the target system in reality. Accordingly, the blackbox scenario is considered to be a more realistic threat. Existing blackbox attacks can be classified into two types—the transfer attack and the decision based attack. In the transfer attack, the attacker trains a student network which mimics the output of the target classifier. The trained student network is then used to get the gradient information for optimizing the adversarial input. In the decision based attack, the attacker simply performs random walk exploration. In the

*boundary attack*[brendel2017decision], a state-of-the-art method in this category, the attacker first generates an initial adversarial sample from a given original sample by drawing a uniformly distributed random pattern multiple times until it happens to lead to misclassification. Initial patterns generated in this way typically have too large amplitudes to be hidden from human perception. The attacker therefore polishes the initial adversarial pattern by Gaussian random walk in order to minimize the amplitude, keeping the classification output constant.

^{1}

^{1}1In the case of the untargeted attack, the classification output is kept

*wrong*, i.e., random walk can go through the areas of any label except the true one. Here our question arises. Is the Gaussian appropriate to drive the adversarial pattern to minimize the amplitude? It could be a reasonable choice if we only consider that the attacker minimizes the norm of the adversarial pattern. However, it is also required to keep the classification output constant through the whole random walk sequence. Provided that the decision boundary of the classifier has complicated structure, reflecting the real-world data distribution, we expect that more efficient random walk can exist. In this paper, we pursue this possibility, and investigate how statistics of random variables affect the performance of attacking strategies. To this end, we generalize the boundary attack, and propose the Lévy-Attack where the random walk exploration is driven by symmetric -stable random variables. We expect that the impulsive characteristic of the -stable distribution induces sparsity in random walk steps, which would drive adversarial patterns along the complicated decision boundary structure efficiently. Naturally, our expectation is reasonable only if the decision boundary has some structure aligned to the coordinate system defined in the data space, so that moving along the canonical direction keep more likely the classification output than moving isotropic directions. In our experiments on MNIST and CIFAR10 datasets, Lévy-Attack with or less shows significantly better performance than the original boundary attack with Gaussian random walk. This implies that our hypothesis on the decision boundary holds at least in those two popular image benchmark datasets. Our results also give an insight into the recently found fact in the whitebox attacking scenario that the choice of the norm for measuring the amplitude of the adversarial patterns is essential.

## Ii Proposed Method

In this section, we first introduce the -stable distribution, and propose the Lévy-Attack as a generalization of the boundary attack.### Ii-a Symmetric -stable Distribution

The symmetric -stable distribution is a generalization of the Gaussian distribution which can model characteristics too impulsive for the Gaussian model. This family of distributions is most conveniently defined by their characteristic functions [samorodnitsky94] due to the lack of an analytical expression for the probability density function. The characteristic function is given as*impulsiveness*of the distribution—the smaller is, the more impulsive the distribution is. The symmetric -stable distribution reduces to the Gaussian distribution for , and to the Cauchy distribution for , respectively. is the location parameter, which corresponds to the mean in the Gaussian case, while is the scale parameter measuring of the spread of the samples around the mean, which corresponds to the variance in the Gaussian case. For more details on -stable distributions, readers are referred to [samorodnitsky94]. 0: Input: Classifier , original image and label Max. number of iterations, termination threshold Output: Adversarial sample 1: repeat 2: for 3: until 4: for to do 5: 6: if then 7: 8: end if 9: if then 9: break 10: end if 11: end for

## Ii Proposed Method

In this section, we first introduce the -stable distribution, and propose the Lévy-Attack as a generalization of the boundary attack.### Ii-a Symmetric -stable Distribution

The symmetric -stable distribution is a generalization of the Gaussian distribution which can model characteristics too impulsive for the Gaussian model. This family of distributions is most conveniently defined by their characteristic functions [samorodnitsky94] due to the lack of an analytical expression for the probability density function. The characteristic function is given as*impulsiveness*of the distribution—the smaller is, the more impulsive the distribution is. The symmetric -stable distribution reduces to the Gaussian distribution for , and to the Cauchy distribution for , respectively. is the location parameter, which corresponds to the mean in the Gaussian case, while is the scale parameter measuring of the spread of the samples around the mean, which corresponds to the variance in the Gaussian case. For more details on -stable distributions, readers are referred to [samorodnitsky94]. 0: Input: Classifier , original image and label Max. number of iterations, termination threshold Output: Adversarial sample 1: repeat 2: for 3: until 4: for to do 5: 6: if then 7: 8: end if 9: if then 9: break 10: end if 11: end for

Comments

There are no comments yet.