Black-Box Decision based Adversarial Attack with Symmetric α-stable Distribution

04/11/2019 ∙ by Vignesh Srinivasan, et al. ∙ Berlin Institute of Technology (Technische Universität Berlin) Fraunhofer Consiglio Nazionale delle Ricerche 0

Developing techniques for adversarial attack and defense is an important research field for establishing reliable machine learning and its applications. Many existing methods employ Gaussian random variables for exploring the data space to find the most adversarial (for attacking) or least adversarial (for defense) point. However, the Gaussian distribution is not necessarily the optimal choice when the exploration is required to follow the complicated structure that most real-world data distributions exhibit. In this paper, we investigate how statistics of random variables affect such random walk exploration. Specifically, we generalize the Boundary Attack, a state-of-the-art black-box decision based attacking strategy, and propose the Lévy-Attack, where the random walk is driven by symmetric α-stable random variables. Our experiments on MNIST and CIFAR10 datasets show that the Lévy-Attack explores the image data space more efficiently, and significantly improves the performance. Our results also give an insight into the recently found fact in the whitebox attacking scenario that the choice of the norm for measuring the amplitude of the adversarial patterns is essential.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The success of deep neural networks (DNNs) [Krizhevsky, lenet, googlenet, vgg, resnet] has led to them being used in many real world applications. However, these models are also known to be susceptible to adversarial attacks, i.e., minimal patterns crafted by attackers who try to fool learning machines [Goodfellow, Papernotb, szegedy, Nguyena, Eykholt, athalye3d]. Such adversarial patterns do not affect human perception much, while they can manipulate learning machines, e.g., to give wrong classification outputs. DNN’s complex interactions between different layers enable high accuracy under the controlled setting, while they make the outputs unpredictable in untrained spots where training samples exist sparsely. If attackers can find such a spot close to a normal data sample, they can manipulate DNNs by adding a very small (optimally invisible in computer vision applications) perturbation to the original sample, leading to fatal errors, e.g., manipulating an autonomous driving system can cause serious accidents. Two attacking scenarios are considered in general—whitebox and blackbox. The whitebox scenario assumes that the attacker has access to the complete target system, including the architecture and the weights of the DNN, as well as the defense strategy if the system is equipped with any. Typical whitebox attacks optimize the classification output with respect to the input by backpropagating through the defended classifier [Carlinib, chen2017ead, sharma2017ead, moosavi2016deepfool]. On the other hand, the blackbox scenario assumes that the attacker has only access to the output. Under this scenario, the attacker has to rely on blackbox optimization, where the objective can be computed for arbitrary inputs, but the gradient information is not directly accessible. Although the whitebox attack is more powerful, it is much less likely that attackers can get full knowledge of the target system in reality. Accordingly, the blackbox scenario is considered to be a more realistic threat. Existing blackbox attacks can be classified into two types—the transfer attack and the decision based attack. In the transfer attack, the attacker trains a student network which mimics the output of the target classifier. The trained student network is then used to get the gradient information for optimizing the adversarial input. In the decision based attack, the attacker simply performs random walk exploration. In the boundary attack [brendel2017decision], a state-of-the-art method in this category, the attacker first generates an initial adversarial sample from a given original sample by drawing a uniformly distributed random pattern multiple times until it happens to lead to misclassification. Initial patterns generated in this way typically have too large amplitudes to be hidden from human perception. The attacker therefore polishes the initial adversarial pattern by Gaussian random walk in order to minimize the amplitude, keeping the classification output constant.111In the case of the untargeted attack, the classification output is kept wrong, i.e., random walk can go through the areas of any label except the true one. Here our question arises. Is the Gaussian appropriate to drive the adversarial pattern to minimize the amplitude? It could be a reasonable choice if we only consider that the attacker minimizes the norm of the adversarial pattern. However, it is also required to keep the classification output constant through the whole random walk sequence. Provided that the decision boundary of the classifier has complicated structure, reflecting the real-world data distribution, we expect that more efficient random walk can exist. In this paper, we pursue this possibility, and investigate how statistics of random variables affect the performance of attacking strategies. To this end, we generalize the boundary attack, and propose the Lévy-Attack where the random walk exploration is driven by symmetric -stable random variables. We expect that the impulsive characteristic of the -stable distribution induces sparsity in random walk steps, which would drive adversarial patterns along the complicated decision boundary structure efficiently. Naturally, our expectation is reasonable only if the decision boundary has some structure aligned to the coordinate system defined in the data space, so that moving along the canonical direction keep more likely the classification output than moving isotropic directions. In our experiments on MNIST and CIFAR10 datasets, Lévy-Attack with or less shows significantly better performance than the original boundary attack with Gaussian random walk. This implies that our hypothesis on the decision boundary holds at least in those two popular image benchmark datasets. Our results also give an insight into the recently found fact in the whitebox attacking scenario that the choice of the norm for measuring the amplitude of the adversarial patterns is essential.

Ii Proposed Method

In this section, we first introduce the -stable distribution, and propose the Lévy-Attack as a generalization of the boundary attack.

Ii-a Symmetric -stable Distribution

The symmetric -stable distribution is a generalization of the Gaussian distribution which can model characteristics too impulsive for the Gaussian model. This family of distributions is most conveniently defined by their characteristic functions [samorodnitsky94] due to the lack of an analytical expression for the probability density function. The characteristic function is given as
(1)
where , and are parameters. We denote the -dimensional symmetric -stable distribution by . is the characteristic exponent expressing the degree of impulsiveness of the distribution—the smaller is, the more impulsive the distribution is. The symmetric -stable distribution reduces to the Gaussian distribution for , and to the Cauchy distribution for , respectively. is the location parameter, which corresponds to the mean in the Gaussian case, while is the scale parameter measuring of the spread of the samples around the mean, which corresponds to the variance in the Gaussian case. For more details on -stable distributions, readers are referred to [samorodnitsky94].
0:   Input: Classifier , original image and label Max. number of iterations, termination threshold Output: Adversarial sample 1:  repeat 2:      for 3:  until  4:  for  to  do 5:      6:     if  then 7:         8:     end if 9:     if   then 9:        break 10:     end if 11:  end for Algorithm 1 (Untargeted) Lévy-Attack

Ii Proposed Method

In this section, we first introduce the -stable distribution, and propose the Lévy-Attack as a generalization of the boundary attack.

Ii-a Symmetric -stable Distribution

The symmetric -stable distribution is a generalization of the Gaussian distribution which can model characteristics too impulsive for the Gaussian model. This family of distributions is most conveniently defined by their characteristic functions [samorodnitsky94] due to the lack of an analytical expression for the probability density function. The characteristic function is given as
(1)
where , and are parameters. We denote the -dimensional symmetric -stable distribution by . is the characteristic exponent expressing the degree of impulsiveness of the distribution—the smaller is, the more impulsive the distribution is. The symmetric -stable distribution reduces to the Gaussian distribution for , and to the Cauchy distribution for , respectively. is the location parameter, which corresponds to the mean in the Gaussian case, while is the scale parameter measuring of the spread of the samples around the mean, which corresponds to the variance in the Gaussian case. For more details on -stable distributions, readers are referred to [samorodnitsky94].
0:   Input: Classifier , original image and label Max. number of iterations, termination threshold Output: Adversarial sample 1:  repeat 2:      for 3:  until  4:  for  to  do 5:      6:     if  then 7:         8:     end if 9:     if   then 9:        break 10:     end if 11:  end for Algorithm 1 (Untargeted) Lévy-Attack