1 Introduction
It has been shown that machine learning models, especially deep neural networks, are vulnerable to small adversarial perturbations, i.e., a small carefully crafted perturbation added to the input may significantly change the prediction results
(Szegedy et al., 2014; Goodfellow et al., 2015; Biggio and Roli, 2018; Fawzi et al., 2018). Therefore, the problem of finding those perturbations, also known as adversarial attacks, has become an important way to evaluate the model robustness: the more difficult to attack a given model, the more robust it is.Depending on the information an adversary can access, the adversarial attacks can be classified into whitebox and blackbox settings. In the
whitebox setting, the target model is completely exposed to the attacker, and adversarial perturbations could be easily crafted by exploiting the firstorder information, i.e., gradients with respect to the input (Carlini and Wagner, 2017; Madry et al., 2018). Despite of its efficiency and effectiveness, the whitebox setting is an overly strong and pessimistic threat model, and whitebox attacks are usually not practical when attacking realworld machine learning systems due to the invisibility of the gradient information.Instead, we focus on the problem of blackbox
attacks, where the model structure and parameters (weights) are not available to the attacker. The attacker can only gather necessary information by means of (iteratively) making input queries to the model and obtaining the corresponding outputs. Blackbox setting is a more realistic threat model, and furthermore, important in the sense that they could serve as a general way to evaluate the robustness of machine learning models beyond neural networks, even when the backpropagation algorithm is not applicable (e.g., evaluating the robustness of treebased models
(Chen et al., 2019a) and nearest neighbor models (Wang et al., 2019)).Blackbox attacks have been extensively studied in the past few years. Depending on what kind of outputs the attacker could derive, blackbox attacks could be broadly grouped into two categories: softlabel attacks and hardlabel attacks. Softlabel attacks assume that the attacker has access to realvalued scores (logits or probabilities) for all labels, while hardlabel attacks assume that the attacker only has access to the final discrete decision (the predicted label).
However, blackbox attacks, especially hardlabel blackbox attacks, usually require a large amount of (usually ) queries to craft an adversarial perturbation. The queryinefficiency issue restricts the usage of blackbox attacks in real applications, while online machine learning systems often set a limit on the query number, or get paid according to the number of queries. More importantly, query inefficiency incurs a false sense of model robustness.
We suspect that this queryinefficiency issue is primarily owing to the high dimensionality of the input space, and therefore to verify the conjecture in this paper, we seek to reduce the number of queries by reducing the dimensionality of the search space. To achieve this objective, we relax the conditions of the blackbox threat model by assuming that a small auxiliary unlabeled dataset is available to the attacker. The assumption is reasonable: imagine that before attacking an image classification model, the attacker just needs to “collect” some unlabeled images. Such images could even be available to the attacker in the absence of any explicit collection, because in a normal attack process the attacker is given in advance target images of which the predicted labels would be expected to change by crafting small perturbations.
With this small unlabeled dataset, we propose a technique named the spanning attack. Specifically, the unlabeled dataset spans a subspace of the input space, and we constrain the search of adversarial perturbations only in this subspace so as to reduce the dimensionality of the search space. This method is motivated by the theoretical analysis that minimum adversarial perturbations for a variety of machine learning models prove to be in the subspace of the training data, and thus the auxiliary unlabeled dataset plays as a substitute for the original training data. We also show that this method is general enough to apply to a wide range of existing blackbox attack methods, including both softlabel attacks and hardlabel attacks. Our experiments verify that the spanning attack could significantly improve query efficiency of blackbox attacks. Furthermore, we show that even a very small and biased unlabeled dataset sampled from a different distribution from the training data suffices to perform favorably in practice.
This paper makes the following contributions:

We present the random attack
framework which captures most existing blackbox attacks in various settings including both softlabel attacks and hardlabel attacks. It is a novel and clear interpretation on the mechanism of blackbox attacks from the perspective of random vectors.

We propose a method to regulate the resulting adversarial perturbation of any random attack to be constrained in a predefined subspace. This is a general method to reduce the dimensionality of the search space of blackbox attacks.

We make preliminary theoretical analysis about the subspace where the minimum adversarial perturbation has to be inside. Motivated by the analysis, we propose to reinforce blackbox attacks (random attacks) by means of constraining a subspace spanned by an auxiliary small unlabeled dataset. In our experiments across various blackbox attacks and target models, the reinforced attack typically requires less than queries while improves success rates in the meantime.
The remainder of the paper is organized as follows. Section 2 discusses related work about blackbox attacks. Section 3 introduces the basic preliminaries and our motivation. Section 4 presents our framework for blackbox attacks and proposes our general method to improve query efficiency. Section 5 reports empirical evaluation results. Section 6 concludes this paper.
2 Related work
Softlabel blackbox attacks.
Chen et al. (2017) showed that softlabel blackbox attacks can be formulated as solving an optimization problem in the zeroth order scenario, where one can query the function itself but not its gradients. Since then many zeroth order optimization algorithms have been proposed for blackbox attacks, such as ZOAdam (Chen et al., 2017), NES (Ilyas et al., 2018), ZOSignSGD (Liu et al., 2019), AutoZOOM (Tu et al., 2019), and Banditattack (Ilyas et al., 2019).
Hardlabel blackbox attacks.
Hardlabel blackbox attacks are more challenging since it is nontrivial to define a smooth objective function for attacks based only on the hardlabel decisions. Brendel et al. (2018) proposed a method based on reject sampling and random walks. Cheng et al. (2019a)
reformulated the attack as a realvalued optimization problem and the objective function is estimated via coarsegrained search and then binary search.
Chen et al. (2019b)proposed an unbiased estimate of the gradient direction at the decision boundary, and an attack method with a convergence analysis.
Cheng et al. (2020) proposed a queryefficient sign estimator of the gradient.Improve query efficiency of blackbox attacks.
Recently, the idea of relaxing the treat model to improve query efficiency of blackbox attacks has attracted increasing attention. Some work captured the idea of transferbased attacks (Papernot et al., 2016; Liu et al., 2017): adversarial examples of a surrogate model also tend to fool other models. Brunner et al. (2018) and Cheng et al. (2019b) both assumed that a surrogate model is available to the attacker. Therefore, the attacker could employ the gradients of the surrogate model as a prior for the true gradient of the target model. Another work (Yan et al., 2019) proposed a softlabel blackbox attack method which employs an auxiliary labeled datasets. Multiple reference models are trained with the labeled
datasets, and a subspace is spanned by perturbed gradients of these reference models. Then the true gradients of the target model are estimated in the subspace. The major difference from our work is that their auxiliary dataset has to be labeled, whereas ours is unlabeled. Moreover, their auxiliary dataset is much larger than ours owing to the need for training reference models. In the ImageNet case, we only need less than 1,000 unlabeled instances, while
Yan et al. (2019) require 75,000 labeled instances. Finally, our method is more general, and can be applied to both softlabel and hardlabel blackbox attacks.3 Background and motivation
First, we introduce the notations for blackbox adversarial attacks. Let denote the input space where is the dimension, and let denote the output space where is the number of labels. The function is a classifier which is the target model to attack, and makes decisions by
where is the score function of the classifier, which outputs scores of all labels for the given instance.
Untargeted adversarial attacks.
Given a radius and a correctlyclassified labeled instance , an untargeted attack aims to find an adversarial perturbation with the norm such that the classifier predicts a different label for the perturbed instance from the original instance , i.e., .
Targeted adversarial attacks.
Given a radius , a correctlyclassified labeled instance , and a target label , a targeted attack aims to find an adversarial perturbation with the norm such that the classifier predicts the label for the perturbed instance , i.e., .
Our paper will focus on untargeted attacks, while it is easy to extend to targeted attacks. Besides, we focus on the norm (Euclidean norm) perturbation where the magnitude of adversarial perturbations are measured by norm, and further research on general norms are left for future work.
Softlabel blackbox attacks.
The attacker has access to the score (logit or probability) output for any input in , i.e.,
. Therefore, any loss defined on the the pair of the score and the groundtruth label is also available to the attacker. We denote the loss function as
.Hardlabel blackbox attacks.
The attacker only has access to the final decision (the predicted label) for any input in , i.e., . It is more challenging than softlabel attacks as they have access to less information.
Softlabel attacks query , while hardlabel attacks query . The number of queries is the cost of blackbox attacks. It is an important issue to reduce the number of queries required when deploying attack methods in real applications.
Criterion for blackbox attacks.
Given a budget , a radius , and a correctlyclassified labeled instance , an untargeted blackbox attack aims to find an adversarial perturbation with the norm such that is satisfied only by means of querying up to times of in the softlabel case or in the hardlabel case. If the attack manages to find an adversarial perturbation within the query budget ( queries), then we say the attack is successful; otherwise it is failed. Therefore, we have two criteria for a blackbox attack: (i) whether it is successful and (ii) the number of queries it executes.
In practice, the input space is usually highdimensional: for instance, the typical input image for an ImageNet model has pixels. It is suspected that requiring such a large amount of queries when searching for an adversarial perturbation in is probably owing to the high dimensionality of . To verify our conjecture, a natural question for blackbox attacks is as below:
“Is it possible to reduce the number of queries
by reducing the dimensionality of the search space?”
In this paper, we provide a positive answer to this question by proposing a method that reinforce blackbox attacks with a small set of unlabeled data.
4 Proposed method
We first introduce the concept of the subspace attack, and then propose a method which utilizes an auxiliary unlabeled dataset to locate an appropriate subspace, namely the spanning attack.
4.1 Subspace attack
Definition 1 (subspace attack).
A subspace attack is an adversarial attack which returns adversarial perturbations in a predefined subspace .
Intuitively, the predefined subspace can be seen as a prior for perturbations of adversarial examples. If the subspace is small enough while still being able to capture most of small adversarial perturbations, then due to the reduced dimensionality, it can significantly reduce the number of queries required for blackbox attacks.
We will focus on one type of blackbox attacks, the random attack, which captures a wide range of (nearly all existing) blackbox attacks, and is convenient to incorporate the prior knowledge about the subspace, and thus easy to be transformed into a subspace attack.
Definition 2 (random attack).
The resulting adversarial perturbation of a random attack is a linear combination of random vectors.
The following lemma highlights an intuition on how to transform a random attack into a subspace attack:
Lemma 1.
If all random vectors of a random attack is constrained to be in a predefined subspace , then the random attack is a subspace attack with respect to .
The proof is straightforward: a linear combination of vectors in a subspace is also in the subspace.
In literature, most of existing blackbox attacks are random attacks — examples will be shown in Section 4.2 and Section 4.3.
Random vectors of random attacks are typically sampled from the Gaussian distribution. Such
Gaussian random vectorsare independent and identically distributed random variables sampled from a standard Gaussian distribution. Let
denote the sampling routine for a Gaussian random vector of dimension . Thus will sample a Gaussian random vector in the original input space . If we could constrain the sampling routine in a subspace, by Lemma 1 the resulting attack would be a subspace attack. Algorithm 1 displays how to sample a Gaussian random vector in a subspace. The subspace is characterized by an orthonormal basis, and the term guarantees that the returned random vector has the same expected value of the length as the original random vector .Note that the returned random vector of Algorithm 1 is a linear combination of the orthonormal vectors. Therefore we have the following lemma:
Lemma 2.
The returned random vector of Algorithm 1 is constrained in the subspace , where returns the smallest space that contains all the input vectors , , , .
Therefore, applying Algorithm 1 to a random attack, we have a subspace attack as the following corollary implies.
Corollary 1.
Given a set of orthonormal vectors , , , , any random attack using Gaussian random vectors can be transformed into a subspace attack of which the corresponding subspace is by means of replacing the sampling routine sample() via Algorithm 1.
Corollary 1 introduces a particular method to transform a random attack (blackbox attack) into a subspace attack. It is noteworthy that the transformation is performed only by means of replacing sampling routines. It does not require to project adversarial perturbations from the input space to the subspace explicitly, and therefore causes as little negative impact as possible on the original random attack.
4.2 Case study: softlabel blackbox attacks
We investigate a softlabel blackbox attack framework of which the attack incorporates a gradientbased optimization method and a backend zerothorder gradient estimation method to search for adversarial perturbations. This framework captures a wide range of softlabel blackbox methods (Ilyas et al., 2018; Liu et al., 2019; Ilyas et al., 2019; Tu et al., 2019; Cheng et al., 2019b). The framework is summarized in Algorithm 2.
In this framework, random vectors could be introduced when initializing the perturbation, which is or a random vector, and estimating gradients by the zerothorder method. A typical example of estimating gradients is the random gradientfree (RGF) method (Nesterov and Spokoiny, 2017), which returns the estimated gradient in the form:
where s are unit Gaussian random vectors (Gaussian random vectors of length 1). Therefore, is a linear combination of random vectors.
Then, the resulting adversarial perturbation is calculated by gradientbased optimization methods such as projected gradient descent (Madry et al., 2018), all of which return linear combinations of the estimated gradients. It follows that these attacks are random attacks and could be easily transformed into a subspace attack via Algorithm 1.
4.3 Case study: hardlabel blackbox attacks
Hardlabel blackbox attacks could be separated into two categories: methods based on random walks (Brendel et al., 2018; Chen et al., 2019b) and methods based on direction estimation (Cheng et al., 2019a, 2020). In the first case, a random walk consists of a succession of random vectors, i.e., the sum of random vectors; in the second case, the gradient with respect to the direction towards the boundary is estimated by RGF or its variant based on the sign of the finite difference. As we discuss before, these gradient estimation methods typically return linear combinations of random vectors. In both cases, the resulting adversarial perturbation is also a linear combination of random vectors, and as a consequence they could also be transformed into subspace attacks obviously.
4.4 Spanning attack
The subspace is a prior information for the subspace attack. To make a subspace attack perform well, it has to be easier to find an adversarial perturbation in the subspace than in original input space . The crux of subspace attacks is how to locate an appropriate subspace . We propose to utilize an auxiliary unlabeled dataset to span the subspace, which is motivated by the theoretical analysis regarding the minimum adversarial perturbation.
Minimum (norm) adversarial perturbation.
A minimum adversarial perturbation is the adversarial perturbation with the minimum norm. Formally, given a classifier and a labeled instance , the minimum adversarial perturbation is defined as
We first analyze the minimum adversarial perturbation of the nearest neighbor classifier. Given a labeled instance to attack, let denote the training data of which labels are the same as , the training data of which labels are different from , and the whole training data. Wang et al. (2019) derived a condition that the minimum adversarial perturbation of the nearest neighbor classifier has to satisfy:
Lemma 3 (Wang et al., 2019).
For any labeled instance to attack, there exist and such that the minimum adversarial perturbation of the nearest neighbor classifier is
Then we have a straightforward corollary about the subspace in which the minimum adversarial perturbation is.
Corollary 2.
Minimum adversarial perturbations of the nearest neighbor classifier are constrained in the space .
Proof.
Similar results on the minimum adversarial perturbation also hold for support vector machine (SVM) classifiers
(Cortes and Vapnik, 1995).Lemma 4.
Minimum adversarial perturbations of SVM classifiers are in the form
where and are scalars.
Proof.
For simplicity, we only consider SVM for binary classification, which can be easily extended by strategies of onevsone or onevsrest. Let denote the optimal solution of SVM. Then based on the primaldual relationship, we have
where and are scalars. When predicting a perturbed example, SVM calculates
where is the angle between and . Therefore, the minimum adversarial perturbation that flips the sign of has to be in the direction of with or in the opposite direction of with . ∎
Then we have the following corollary, which is a direct corollary of Lemma 4.
Corollary 3.
Minimum adversarial perturbations of the SVM classifiers are constrained in the space .
It is inspiring that NN and SVM have a great difference whereas share the same property:
“Minimum adversarial perturbations prove to be
in the subspace spanned by the training data”.
On the contrary, computing minimum adversarial perturbations for neural networks and treebased ensemble models has shown to be NPhard (Katz et al., 2017; Kantchelian et al., 2016), and it is an open problem in what conditions minimum adversarial perturbations for neural networks and treebased ensemble models are also in the space . Nevertheless, Corollary 2 and Corollary 3 still motivate us to search for adversarial perturbations in the space .
In practice, it is not applicable to assume the training data are available to attackers. To make a relaxation, we assume that the attacker only has access to an auxiliary unlabeled dataset . By this means, subspace attackers search for adversarial perturbations in , namely the spanning attack, i.e., the subspace attack by spanning an auxiliary unlabeled dataset. For convenience, we simply term the auxiliary unlabeled dataset as the subspace dataset. Moreover, since in the following only the subspace dataset is utilized for attacks, we use to denote the subspace dataset unambiguously.
By Corollary 1, given a subset dataset , the spanning attack requires a set of orthonormal vectors which is a basis for so as to transform a random attack into a subspace attack. We could make it by the standard process of orthonormalization
, which can be performed by the GramSchmidt process, the Householder transformation etc
(Cheney and Kincaid, 2010). Therefore, the overall procedures of our spanning attack is as below:
Compute a basis of by orthonormalization;

Transform the random attack into a subspace attack by Algorithm 1;

Attack the target model with the resulting subspace attack.
4.5 Selective spanning attack
The spanning attack searches for adversarial perturbations in the space , which is a subspace of the input space . A natural question is whether it is possible to benefit more by means of explicitly selecting a subspace of instead of using directly. We term the method which searches for adversarial perturbations in a nontrivial subspace of as the selective spanning attack, as it selects a subspace from .
In the case of the selective spanning attack, the GramSchmidt process or Householder transformation is not instructive enough to select a subspace of , since there is no significant difference among the derived orthonormal vectors.
Instead, we employ the singular value decomposition (SVD) to derive a set of orthonormal vectors which is a basis of
. In particular, assume the subspace dataset has different instances and is the matrix of which the th row is . By SVD, can be decomposed into the formwhere and are orthogonal matrices and is a diagonal matrix, of which diagonal entries are singular values.
It could be proved that the right singular vectors (columns of ) satisfy the following property:
Lemma 5.
Right singular vectors of which the corresponding singular values are larger than zero are an orthonormal basis for .
Proof.
Let denote the number of nonzero singular values. Then, we have the compact SVD as
Let denote the th column of . The objective is to prove
which is equivalent to

(i) and

(ii) .
For any , by definition there exists such that
Therefore, we have (i) .
By the compact SVD, we also have
Thus, for any , there exists such that
Therefor, we have (ii) .
∎
We denote these right singular vectors as , , , with corresponding singular values larger than zero, and they are sorted according to the corresponding singular values such that the singular values have . Then, we have roughly two options for the selective spanning attack: selecting the top singular vectors, and selecting the bottom singular vectors. We term these two options as the top spanning attack and the bottom spanning attack respectively.
Since the selective spanning attack does not require the label information, the top spanning attack and the bottom spanning attack have their own advantageous situations depending on the labels of the underlying data distribution. We illustrate two toy cases in Figure 1. In the first case, the top spanning attack is favorable since we can find adversarial perturbations along , and the second case is the exact opposite. Roughly speaking, top singular vectors represent directions along the manifold of the dataset, and bottom singular vectors represent directions out of the manifold. It is believed that for highdimensional datasets adversarial examples widely exist in directions out of the manifold (Stutz et al., 2019). Therefore, the bottom spanning attack would be a better choice in practice, which is validated in our experiments.
5 Experiments
In this section, we empirically validate the performance of the proposed spanning attack. Specifically, we select three representative blackbox (random) attacks as baselines and employ the spanning attack to reinforce them:

The RGF attack (Cheng et al., 2019b): a softlabel attack within the framework considered in the case study for softlabel attacks;

The boundary attack (Brendel et al., 2018): a pioneering widelyused hardlabel attack based on random walks;

The SignOPT attack (Cheng et al., 2020): a stateoftheart hardlabel attack based on direction estimation.
We perform untargeted blackbox attacks on the ImageNet dataset (Deng et al., 2009). Attacks are performed against the pretrained ResNet50 (He et al., 2016), VGG16 (Simonyan and Zisserman, 2015) and DenseNet121 (Huang et al., 2017)
from the PyTorch model zoo
(Steiner et al., 2019), since these architectures are diverse and representative. Correctlyclassified images are randomly sampled from the validation set as the evaluation dataset, of which every labeled image is the instance to attack. The size of the evaluation dataset for softlabel attacks is 1,000, and the one for hardlabel attacks is 100 for computational efficiency. Another unlabeled 1,000 images are sampled from the validation set as the subspace dataset. The evaluation dataset and the subspace dataset do not overlap. We set the perturbation radius and the budget . If an attack method finds an adversarial perturbation within queries such that holds, then this attack is successful; otherwise it is failed. Therefore, we have two criteria for a blackbox attack: (i) whether it is successful and (ii) the number of queries it executes.All hyperparameters of the spanning attacks are the same as the corresponding baselines. The only difference is introducing an appropriate subspace via our methods. We refer to Cheng et al. (2019b), Brendel et al. (2018) and Cheng et al. (2020) for details of the baseline blackbox methods.
5.1 Main results
Success rate  Query mean  Query median  

RGF (softlabel)  Baseline  0.965  596.088  358.000 
Spanning attack  0.988  315.879  205.000  
Boundary (hardlabel)  Baseline  0.720  4133.903  3291.000 
Spanning attack  0.880  3197.557  2569.500  
SignOPT (hardlabel)  Baseline  0.970  2392.175  2143.000 
Spanning attack  1.000  1053.220  647.000 
Success rate  Query mean  Query median  

RGF (softlabel)  Baseline  0.970  450.484  256.000 
Spanning attack  0.984  259.006  154.000  
Boundary (hardlabel)  Baseline  0.810  3467.086  2787.000 
Spanning attack  0.940  2972.755  2263.000  
SignOPT (hardlabel)  Baseline  1.000  1665.080  1450.000 
Spanning attack  1.000  840.900  572.500 
Success rate  Query mean  Query median  

RGF (softlabel)  Baseline  0.977  515.855  358.000 
Spanning attack  0.995  260.203  154.000  
Boundary (hardlabel)  Baseline  0.670  3806.687  3389.000 
Spanning attack  0.890  3063.449  2261.000  
SignOPT (hardlabel)  Baseline  0.980  2407.398  1863.500 
Spanning attack  1.000  1014.280  688.500 
Success rates, query means and query medians on the evaluation dataset are reported in Table 1 for ResNet50, Table 2 for VGG16, and Table 3 for DenseNet121. By convention only successful adversarial perturbations are counted for query means and query medians. On the one hand, this criterion is favorable for the method with a lower success rate and a lower query number on successful adversarial perturbations. On the other hand, if a method has a higher success rate and a lower query number on successful perturbations than the other, we would have more confidence to conclude that the first method performs better.
Our results show that the spanning attack improves the baseline methods significantly in terms of both success rates and query numbers, consistently across all baseline methods and all pretrained target models.
In particular, in the case of the RGF attack and the SignOPT attack, the spanning attack only needs approximately half of the queries of the baseline for a successful attack, and increases success rates in the meantime. For example, in the SignOPT case for ResNet50, the spanning attack improves the success rate to , and more crucially, it only requires queries in terms of the query mean and queries in terms of the query median!
In the case of the boundary attack, while success rates of the baseline boundary attack within the given budget is not satisfying, our spanning attack improves the success rates by a wide margin. For example, in the Boundary case for VGG16, the spanning attack improves the success rate from to .
Visualization of the subspace basis.
A sample of vectors of the resulting orthonormal basis are visualized in Figure 2. Note that these vectors reveal lowdimensional structures of the subspace rather than Gaussian noise.
Examples of adversarial images.
Several adversarial images, crafted by the baseline method and the spanning attack, are displayed in Figure 3. All these adversarial images does not show any significant difference from the original images due to the fact that they have the same smallnorm constraint.
Since the results are consistent across different baseline methods and target models, in the following we take the RGF attack and ResNet50 as illustration by default to avoid unnecessary repetition.
5.2 Investigation on the subspace
In this section, we study to what extent the subspace affects the performance of the spanning attack.
5.2.1 Size of the subspace dataset.
We show attack performance with different sizes of the subspace dataset in Figure 4. In our experiments, the minimum size is 100 and the maximum size is 1,000. In this scope, the larger the subspace size, the better the performance of the spanning attack. In contrast, the baseline method is the extreme case where the subspace size is . Therefore, it is expected that the performance of the spanning attack will reach the peak and then slide down as the subspace size increases. It is noteworthy that even a small subspace dataset, as shown in Figure 4, would help the spanning attack defeat the baseline methods.
5.2.2 Distribution of the subspace dataset
Success rate  Query mean  Query median  

Baseline  0.965  596.088  358.0 
Spanning attack  0.988  315.879  205.0 
Spanning attack (Flickr8k)  0.988  322.228  205.0 
Totally random subspace  0.957  714.893  409.0 
Bottom spanning attack  0.988  299.205  154.0 
Top spanning attack  0.988  334.485  205.0 
We investigate whether it is necessary to sample the subspace dataset from the same distribution as the training data. We sample 1,000 unlabeled images from the Flickr8k dataset (Hodosh et al., 2015)
, much different from the ImageNet, as the subspace dataset. For comparison as an extreme case, we also show the results of the spanning attack with a totally random subspace which is spanned by an unlabeled dataset of which every image is sampled from a uniform distribution. In other words, the subspace is chosen totally randomly without any prior knowledge.
The results are displayed in the middle area of Table 4. On the one hand, the results of the Flickr8k spanning attack are still better than the baseline, and competitive with the spanning attack in Section 5.1, where the subspace dataset is sampled from the ImageNet validation set. It suggests that conditions that the subspace dataset has to obtain is not too strict in practice, and even a biased subspace dataset suffices to work, which extends the application range of the spanning attack. The subspace dataset does not necessarily has to be sampled from the same distribution with the training data.
On the other hand, the spanning attack with a totally random subspace performs even worse than the baseline, which validates that prior knowledge given by the subspace dataset is necessary, rather than an arbitrary lowdimensional subspace.
5.2.3 The bottom and top spanning attack
We investigate whether the selective spanning attack could further improve performance. In our experiments, the bottom 800 singular vectors (remember the total number is 1,000) are used for the bottom spanning attack, and the top 800 singular vectors are used for the top spanning attack. The comparison among the original spanning attack, bottom spanning attack and top spanning attack are shown in the lower area of Table 4. The results show that the bottom spanning attack could further improve performance, whereas the top spanning attack has a negative impact. This is an empirical validation that adversarial perturbations are more likely to appear in directions out of the data manifold, than along the data manifold, as discussed in Section 4.5.
5.3 More discussion with related work
Although the work of Yan et al. (2019) has a different setting with ours as discussed in Section 2, for completeness we try to adapt their methods for comparison. They require an auxiliary labeled dataset, focus on norm and only considers the softlabel blackbox attack. In order to have a comparison, we let the subspace dataset be labeled with size 1,000. We notice that with such a small subspace dataset in our setting, Yan et al. (2019)’s method does not perform well. For instance, when attacking ResNet50, it has the success rate 58.7% and the query mean 641.283. The results for attacking VGG16 and DenseNet121 are similar (VGG16: success rate 68.6%, query mean 558.044; DenseNet121: success rate 59%, query mean 623.603). It is primarily due to the fact that their method trains substitute models with labeled data. As a consequence, when the dataset is too small, it is difficult to train reliable substitute models.
6 Conclusion
We propose a general technique named the spanning attack to improve efficiency of blackbox attacks. The spanning attack is motivated by the theoretical analysis that minimum adversarial perturbations of machine learning models tend to be in the subspace of the training data. In practice, the spanning attack only requires a small auxiliary unlabeled dataset, and is applicable to a wide range of blackbox attacks including both the softlabel blackbox attacks and hardlabel blackbox attacks. Our experiments show that the spanning attack can significantly improve the query efficiency and success rates of blackbox attacks.
References

Wild patterns: ten years after the rise of adversarial machine learning
. Pattern Recognition 84, pp. 317–331. Cited by: §1.  Decisionbased adversarial attacks: reliable attacks against blackbox machine learning models. In International Conference on Learning Representations (ICLR), Cited by: §2, §4.3, 2nd item, §5.

Guessing smart: biased sampling for efficient blackbox adversarial attacks..
In
IEEE International Conference on Computer Vision (ICCV)
, pp. 4958–4966. Cited by: §2.  Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1.

Robust decision trees against adversarial examples
. In International Conference on Machine Learning (ICML), pp. 1122–1131. Cited by: §1.  HopSkipJumpAttack: a queryefficient decisionbased attack. CoRR abs/1904.02144. Cited by: §2, §4.3.

ZOO: zeroth order optimization based blackbox attacks to deep neural networks without training substitute models.
In
ACM Conference on Computer and Communications Security (CCS) Workshop on Artificial Intelligence and Security (AISec)
, pp. 15–26. Cited by: §2.  Linear algebra: theory and applications. The Saylor Foundation. Cited by: §4.4.
 Queryefficient hardlabel blackbox attack: an optimizationbased approach. In International Conference on Learning Representations (ICLR), Cited by: §2, §4.3.
 Signopt: a queryefficient hardlabel adversarial attack. In International Conference on Learning Representations (ICLR), Cited by: §2, §4.3, 3rd item, §5.
 Improving blackbox adversarial attacks with a transferbased prior.. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §2, §4.2, 1st item, §5.
 Supportvector networks. Machine Learning 20 (3), pp. 273–297. Cited by: §4.4.
 ImageNet: a largescale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. Cited by: §5.
 Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning 107 (3), pp. 481–508. Cited by: §1.
 Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), Cited by: §1.
 Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. Cited by: §5.

Framing image description as a ranking task: data, models and evaluation metrics
. In International Conference on Artificial Intelligence (IJCAI), pp. 4188–4192. Cited by: §5.2.2.  Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. Cited by: §5.
 Blackbox adversarial attacks with limited queries and information. In International Conference on Machine Learning (ICML), pp. 2142–2151. Cited by: §2, §4.2.
 Prior convictions: blackbox adversarial attacks with bandits and priors. In International Conference on Learning Representations (ICLR), Cited by: §2, §4.2.
 Evasion and hardening of tree ensemble classifiers. In International Conference on Machine Learning (ICML), pp. 2387–2396. Cited by: §4.4.
 Reluplex: an efficient SMT solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pp. 97–117. Cited by: §4.4.
 SignSGD via zerothorder oracle. In International Conference on Learning Representations (ICLR), Cited by: §2, §4.2.
 Delving into transferable adversarial examples and blackbox attacks. In International Conference on Learning Representations (ICLR), Cited by: §2.

Towards deep learning models resistant to adversarial attacks
. In International Conference on Learning Representations (ICLR), Cited by: §1, §4.2.  Random gradientfree minimization of convex functions. Foundations of Computational Mathematics 17 (2), pp. 527–566. Cited by: §4.2.
 Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. CoRR abs/1605.07277. Cited by: §2.
 Very deep convolutional networks for largescale image recognition. In International Conference on Learning Representations (ICLR), Cited by: §5.
 PyTorch: an imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS), pp. 8024–8035. Cited by: §5.
 Disentangling adversarial robustness and generalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6976–6987. Cited by: §4.5.
 Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), Cited by: §1.

Autozoom: autoencoderbased zeroth order optimization method for attacking blackbox neural networks
. In AAAI, Vol. 33, pp. 742–749. Cited by: §2, §4.2.  Evaluating the robustness of nearest neighbor classifiers: a primaldual perspective.. CoRR abs/1906.03972. Cited by: §1, §4.4, Lemma 3.
 Subspace attack: exploiting promising subspaces for queryefficient blackbox attacks. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §2, §5.3.
Comments
There are no comments yet.