1 Introduction
Deep neural networks (DNNs) have been successfully applied in many missioncritical tasks, such as autonomous driving, face recognition,
etc.. However, it has been shown that DNN models are vulnerable to adversarial examples (Szegedy et al., 2013; Goodfellow et al., 2014), which are indistinguishable from natural examples but make a model produce erroneous predictions. If the attacker can access to the parameters of the attacked model, it is called whitebox attack. If the attack can only obtain the query feedback from the attacked model, while no information about the model parameters or architectures, it is called blackbox attack.In real scenarios, the DNN model behind one product, as well as the training dataset, are often hidden from users. Instead, only the output for each query (e.g.., labels or scores) are accessible. In this case, the product provider faces the severe threats from blackbox attacks, which have achieved rapid progress in recent about three years. The main challenge is that the provider has to provide good feedback for normal queries, but he/she doesn’t know whether a query is normal or malicious. Moreover, the provider has no information about what kinds of blackbox attack strategies adopted by the attacker. Meanwhile, considerable efforts have been devoted to improving the adversarial robustness of DNNs (Madry et al., 2018; Papernot et al., 2016; Tramèr et al., 2017; Tramer et al., 2020; Cohen et al., 2019). Among them, adversarial training (AT) is considered as one of the most effective defense techniques (Athalye et al., 2018a). However, the improved robustness from AT is often accompanied by significant degradation of the normal accuracy. Besides, the training cost of AT is much higher than the standard training, and the AT model also often suffers from poor generalization to new samples and new attack methods . Thus, we think that AT based defense is not a very suitable choice for the DNN model deployed in real scenarios. In contrast, a good defense technique should satisfy the following requirements: no significant degradation of normal accuracy, lightweight, plugandplay, and good generalization to diverse blackbox attacks.
In real scenarios, the DNN model as well as the training dataset, are often hidden from users. Instead, only the model feedback for each query (e.g.., classification labels or scores) are accessible. In this case, the product provider mainly faces the severe threats from blackbox attacks, which have achieved rapid progress in recent about three years. The main challenges of blackbox defense are 1) the defender should not significantly influence the model’s feedback to normal queries, but it is difficult to know whether a query is normal or malicious; 2) the defender has no information about what kinds of blackbox attack strategies adopted by the attacker. Recently, considerable efforts have been devoted to improving the adversarial robustness of DNNs (Madry et al., 2018; Papernot et al., 2016; Tramèr et al., 2017; Tramer et al., 2020; Cohen et al., 2019). Among them, adversarial training (AT) is considered as one of the most effective defense techniques (Athalye et al., 2018a). However, the improved robustness from AT is often accompanied by significant degradation of the normal accuracy. Besides, the training cost of AT is much higher than the standard training, and the AT model also often suffers from poor generalization to new samples and new attack methods (Sokolic et al., 2017; Zhang et al., 2017; Geirhos et al., 2019). Thus, we think that AT based defense is not a very suitable choice for blackbox defense. In contrast, we expect that a good defense technique should satisfy the following requirements: no significant degradation of normal accuracy, lightweight, plugandplay, and good generalization to diverse blackbox attacks.
Tea.yb: In real scenarios, the DNN model as well as the training dataset, are often hidden from users. Instead, only the model feedback for each query (e.g.., classification labels or scores) are accessible. In this case, the product provider mainly faces the severe threats from blackbox attacks, which have achieved rapid progress in recent about three years. The main challenges of blackbox defense are 1) the defender should not significantly influence the model’s feedback to normal queries, but it is difficult to know whether a query is normal or malicious; 2) the defender has no information about what kinds of blackbox attack strategies adopted by the attacker. Recently, considerable efforts have been devoted to improving the adversarial robustness of DNNs (Madry et al., 2018; Papernot et al., 2016; Tramèr et al., 2017; Tramer et al., 2020; Cohen et al., 2019). Among them, adversarial training (AT) is considered as one of the most effective defense techniques (Athalye et al., 2018a). AT modifies the training process by augmenting adversarial examples and thus cannot be combined with offtheshell models or applications to boost their adversarial robustness. Besides, the time complexity of adversarial training is very high and AT cannot scale to largescale datasets. In this work, we consider the more practical blackbox defense scenarios, in which a good defense technique should satisfy the following requirements: no significant degradation of normal accuracy, lightweight, plugandplay, and good generalization to diverse blackbox attacks.
In this work, we study a lightweight defense strategy that each query is perturbed by random noise, such that the returned feedback will contain some randomness. It has been shown in (Dong et al., 2020) that random noise defense (RND) shows good defense performance to many blackbox attack methods. This work provides the first theoretical analysis about the effectiveness of RND, and demonstrates that its effectiveness significantly depends on the magnitude ratio between the random noise added by the defender (i.e.., RND) and the random noise added by the attacker for gradient estimation. Moreover, we also analyze the attack performance of the adaptive attack, which is considered as an effective strategy to mitigate the defense effect due to the randomness. Our theoretical result tells that the adaptive attack has limited effect to evade the RND defense, especially when the input dimension is high. Above two analyses imply that inserting random noises with larger magnitudes into each query will have better defense performance to both standard and adaptive blackbox attacks. But on the other side, larger random noises will lead to larger degradation of normal accuracy. To obtain a better tradeoff between the defense effect and the normal accuracy, we propose to train the model using Gaussian augmentation training (GT), which adds Gaussian noise to each training data, to improve the model’s robustness to random noise in each individual query. Consequently, the combination of the GT model and the RND defense is a practical and effective defense strategy to defend the models deployed in real scenarios. Prof.by: Above two paragraphs will be modified again tonight
In this work, we study a lightweight defense strategy, dubbed Random Noise Defense (RND) against querybased blackbox attacks. The key idea of RND is to add testtime randomness by adding small random noise to each query. The returned feedback will contain some randomness, and thus mislead the attack process of querybased attacks. More importantly, we provide the theoretical analysis about the effectiveness of RND, and demonstrates that its effectiveness significantly depends on the magnitude ratio between the random noise added by the defender (i.e.., RND) and the random noise added by the attacker for gradient estimation. We also analyze the attack performance of the adaptive attack, which is considered as an effective strategy to mitigate the defense effect due to the randomness. Our theoretical analysis tells that the adaptive attack has limited effect to evade the RND defense. Above two analyses imply that inserting random noises with larger magnitudes into each query will have better defense performance to both standard and adaptive blackbox attacks. But on the other side, larger random noises may cause unacceptable degradation of normal accuracy. To achieve a better tradeoff between the defense effect and the normal accuracy, we propose to train the model using Gaussian augmentation training (GT), to improve the model’s robustness to random noise in each query. Consequently, the combination of GT and RND is a practical and effective strategy to defend blackbox attacks in real scenarios.
Tea.yb: In this work, we study a lightweight defense strategy, dubbed Random Noise Defense (RND) for querybased blackbox attacks. The key idea of RND is to add testtime randomness by adding small random noise to each query. The returned feedback will contain some randomness and thus misleading the attack process of querybased attacks. More importantly, we provide the theoretical analysis about the effectiveness of RND, and demonstrates that its effectiveness significantly depends on the magnitude ratio between the random noise added by the defender (i.e.., RND) and the random noise added by the attacker for gradient estimation. Moreover, we also analyze the attack performance of the adaptive attack, which is considered as an effective strategy to mitigate the defense effect due to the randomness. Our theoretical result tells that the adaptive attack has limited effect to evade the RND defense, especially when the input dimension is high. Above two analyses imply that inserting random noises with larger magnitudes into each query will have better defense performance to both standard and adaptive blackbox attacks. But on the other side, larger random noises will lead to larger degradation of normal accuracy. To obtain a better tradeoff between the defense effect and the normal accuracy, we propose to train the model using Gaussian augmentation (GT), to improve the model’s robustness to random noise in each query. Consequently, the combination of the GT and the RND defense is a practical and effective defense strategy to defend blackbox attacks in real scenarios.
The main contributions of this work are threefold. 1) To the best of our knowledge, this work is the first to theoretically analyze the effect of random noise defense against both standard and adaptive querybased blackbox attacks. 2) Inspired by the theoretical analysis, we propose a practical and effective defense strategy in real scenarios, by combining the Gaussian augmentation training and random noise defense. 3) Extensive experiments further verify the presented theoretical analysis, and demonstrate the effectiveness of the proposed defense strategy against several stateoftheart querybased blackbox attacks.
Recent breakthroughs in deep neural networks (DNNs) have led to substantial success in a wide range of fields. However, DNN models are vulnerable to adversarial examples (Szegedy et al., 2013; Goodfellow et al., 2014), which are indistinguishable from natural examples but make a model produce erroneous predictions. Considerable efforts have been devoted to improving the adversarial robustness of whitebox attacks (Madry et al., 2018; Papernot et al., 2016; Athalye et al., 2018a; Tramèr et al., 2017; Tramer et al., 2020; Cohen et al., 2019). Among them, adversarial training (AT) is one of the most effective techniques (Athalye et al., 2018a).
For blackbox attacks, the adversaries have no knowledge of the model and only obtain the predicted probabilities or predicted class within a limited number of queries. So, blackbox attacks, especially querybased attacks are more practical and threatening in various securitysensitive applications, such as public API, finance, and autonomous driving. However, the methods targeting defending against blackbox attacks are much less
(Bhambri et al., 2019). Also, existing defenses developed for whitebox attacks may not be effective against querybased attacks (Dong et al., 2020). Besides, existing defense methods such as AT are difficult to scale to large models and datasets and also significantly decrease natural accuracy. So, these defense methods are difficult to deploy in real scenarios. Therefore, it is essential to develop blackbox defense strategies that must satisfy practical demands: not sacrificing natural accuracy, plugandplay, lightweight defense, and easy to scale to largescale models and datasets. Random Noise Defense (RND) is a promising defense mechanism which adds random perturbations to each query to disturb attack process. Proper randomness can effectively delay the attack process of the query method without affecting the normal users. Moreover, It is a lightweight defense and can be directly combined with any offtheshelf models and other defense methods. So, RND can also be easily adopted in practical applications. However, We still lack theoretical guarantees of random noise defense and the actual effectiveness of this defense is not yet fully understood. In this work, we will analyze this defense mechanism against querybased attack under scorebased setting, based on the theoretical analysis and detailed experiments and answer the following three questions: 1) Is RND defense really effective for querybased attack? or What are the conditions for the validity of RND? 2) Is it still effective when coping with adaptive attacks? 3) Can we achieve a better tradeoff between the good natural accuracy and better defense performance based RND?Our contributions are summarized as follows:

We give the theoretical guarantees for the validity of RND defense based on ZeroOrder Optimization. Detailed Experiments on CIFAR10 and ImageNet validate our theoretical analysis.

Assuming stronger attackers aware of the detail of defense mechanism, we give the theoretical guarantees of RND effectiveness against the corresponding adaptive attack.

Based the above analysis of RND, we also propose a stronger defense combining RND with gaussian augmentation training (RNDGT). Experiment results validate its better defense performance.
2 Related Work
Querybased methods
Here we mainly review the querybased blackbox attack methods, which can be categorized into two classes, including gradient estimation and searchbased methods. Gradient estimation methods are based on zeroorder (ZO) optimization algorithms. In searchbased methods, the attacker utilizes a direct search strategy to find the search direction to decrease the function value, instead of explicitly estimating gradient. Furthermore, we focus on the scorebased queries, where the continuous score (e.g.
., the posterior probability or the logit) for each query is returned, in contrast to the decisionbased queries which return hard labels. Specifically,
(Ilyas et al., 2018a) proposed the first limited querybased attack method by utilizing the Natural Evolutionary Strategies (NES) to estimate the gradient. (Liu et al., 2018a) proposed the ZOsignSGD algorithm which is similar to NES and gave the detailed convergence analysis of attack algorithm. Based on Bandit Optimization, (Ilyas et al., 2018b) proposed to combine the time and data dependent gradient prior with gradient estimation, which dramatically reduced the number of queries. For the norm constrain, SimBA (Guo et al., 2019a) randomly sampled a perturbation from orthonormal basises and also utilized discrete cosine transform (DCT) to reduce the dimension of search space. SignHunter (AlDujaili and O’Reilly, 2020) focused on estimating the sign of gradient and flipped the sign of perturbation to improve query efficiency. Square attack (Andriushchenko et al., 2020) is the stateofart querybased attack method which selects localized square shaped updates at random positions of images.The querybased blackbox attack methods can be divided into two classes: gradient estimation methods, searchbased methods
. The gradient estimation methods are based on zeroorder (ZO) optimization algorithms. For searchbased methods, the attackers utilize a direct search strategy to find the feasible point which can decrease the function value instead of explicitly estimating gradient. Scorebased attacks is stronger attack setting which allows the adversaries to obtain probability for each class. So, we focus on scorebased attacks. For scorebased blackbox attacks, the attackers only access to the score predicted by a classifier for each class for a given input.
Querybased methods
(Ilyas et al., 2018a) proposed the first limited querybased attack method by utilizing the Natural Evolutionary Strategies (NES) to estimate the gradient. (Liu et al., 2018a) proposed the ZOsignSGD algorithm which is similar to NES and gave the detailed convergence analysis of attack algorithm. Based on Bandit Optimization, (Ilyas et al., 2018b) proposed to combine the time and data dependent gradient prior with gradient estimation, which dramatically reduced the number of queries. For the norm constrain, SimBA (Guo et al., 2019a) randomly sampled a perturbation from orthonormal basises and also utilized discrete cosine transform (DCT) to reduce the dimension of search space. SignHunter (AlDujaili and O’Reilly, 2020) focused on estimating the sign of gradient and flipped the sign of perturbation to improve query efficiency. Square attack (Andriushchenko et al., 2020) is the stateofart querybased attack method which selects localized square shaped updates at random positions of images.
BlackBox Defense
Compared with the defense for whitebox attacks, the defense specially designed for blackbox attacks has not been well studied. Two recent works (Chen et al., 2020; Li et al., 2020) proposed to detect malicious queries based on the comparison between the current query and the history queries, utilizing the factor that the malicious query will be similar with the previous malicious queries for attacking the same benign example, while the similarity between different benign examples will be much smaller. AdvMind (Pang et al., 2020) proposed to infer the intent of adversary, and it also needed to store the history queries. However, if the attacker adopted the strategy of longinterval malicious queries, the defender has to store a long history, with very high cost of storage and comparison. (Salman et al., 2020) proposed the first blackbox certified defense method, dubbed denoised smoothing. (Byun et al., 2021) also showed that adding random noise can defend against querybased attacks through experimental evaluations, without theoretical analysis of the defense effect. There are also a few randomizationbased defenses. RP (Xie et al., 2018) proposed a random input transformbased method. RSE (Liu et al., 2018b) added large Gaussian noise to both the input and activation of each layer. PNI (He et al., 2019) combined adversarial training with adding Gaussian noise to the input or weight of each layer, which achieved the better defense performance. However, these methods significantly sacrificed the accuracy of benign examples. PixelDP (Lecuyer et al., 2019) and random smoothing (Cohen et al., 2019; Salman et al., 2019) proposed to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the norm. However, they required to corrupt each query multiple times using Gaussian noise, to obtain a majority prediction of this query. Obviously, such a defense places a huge burden to use the model in practice. In contrast, the defense mechanism of RND that is mainly studied in this work only perturbs each query once, without any extra burden. (Rusak et al., 2020) showed that the model with Gaussian augmentation training achieves the stateofart defense against common corruptions and good defense against whitebox attacks, while the defense to blackbox attacks is not evaluated. Apart from aforementioned defenses against querybased blackbox attacks, ensemble adversarial training (EAT) (Tramèr et al., 2017) is considered as a good defense against transferbased blackbox attacks. It trains the model with multistep attacks generated on an ensemble of several surrogate model.
BlackBox Defense
The defense against blackbox attacks has not been well studied. For defense against transferbased attacks, Ensemble Adversarial Training (EAT) (Tramèr et al., 2017) is the stateofart defense which train model with multistep attacks generated on an ensemble of several surrogate model. Studies that mainly target defending against querybased blackbox attacks are much less. However, historybased detection techniques for querybased attacks have been proposed recently (Chen et al., 2020; Li et al., 2020). Considering that compared with the queries from normal users, there is a much larger similarity among queries from adversaries. They store the past query images and compare them to detect this unusual property. AdvMind (Pang et al., 2020) propose to infer the intent of adversary, and they also need to store the past query images. (Salman et al., 2020) proposed the first blackbox certified defense method, Denoised Smoothing. (Byun et al., 2021) also show that adding random noise can defend against querybased attacks, but they don’t give the theoretical analysis of the defense mechanism. There are also several randomizationbased defenses. RP (Xie et al., 2018) proposed a random input transformbased method. RSE (Liu et al., 2018b) adds large Gaussian noise to input and activation of each layer. PNI (He et al., 2019) combines adversarial training with adding Gaussian noise to input or weight of each layer, which achieved the better defense performance. However, they all significantly degrade the accuracy of clean images and are very difficult to scale to largescale datasets. PixelDP (Lecuyer et al., 2019) and Random Smoothing (Cohen et al., 2019; Salman et al., 2019) turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the norm. Different from us, to guarantee the robustness, they need to predict in majority the correct class under Gaussian noise corruptedcopies of samples, which will place a huge burden on using model in practice. However, we only need to add noise to each query once. (Rusak et al., 2020) shows that model with Gaussian augmentation training achieves the stateofart defense against common corruptions and good defense against whitebox attacks.
stress we cannot compare AND with RS directly, because working principle and purpose of the two methods are totally different.
3 Preliminaries
3.1 Scorebased BlackBox Attack
We denote the attacked model as with being the input space and being the output space. Given a benign data , the goal of adversarial attack is to generate an adversarial example that is similar with , but to enforce prediction of to be different with the groundtruth label (i.e.., untargeted attack) or to be a target label (i.e.., targeted attack). It can be generally formulated as follows:
(1)  
(2) 
where for untargeted attack, while for targeted attack. denotes the logit or posterior probability w.r.t.. class . indicates a ball around ( is often specified as or ), with . Consequently, the objective function is undifferentiable, and it can not be optimized by gradient based algorithms.
3.2 ZeroOrder Optimization for BlackBox Attack
Since the derivative of can not be obtained directly, we have to resort to derivativefree, also called zeroorder (ZO) optimization algorithms. Some blackattack methods have been developed based on ZO optimization, dubbed ZO attack, and show very promising attack performance. The general idea of ZO attack methods is to estimate the gradient direction according to the objective values returned by queries. A widely used gradient estimator (Nesterov and Spokoiny, 2017; Duchi et al., 2015) is
(3) 
where , and . Based on this estimated gradient, the attack procedure similar with whitebox attack can be conducted (e.g.., projected gradient descent), which is briefly summarized in Algorithm 2. Note that here we only focus on the gradient estimator based on queries to the attacked/target model, while those utilizing the transferability from surrogate to target models are not covered, which will be further discussed in Section 7.8.
(4) 
4 Querybased BlackBox Attack Methods
4.1 Gradient Estimation Attack Methods
We mainly analyze the gradient estimation attack methods because they are much easier to implement in practice and also have good theoretical convergence guarantees. We first give the notation we will used. denotes data distribution. denotes our each data sample and is the corresponding label and is output label space. We denote the attacked blackbox model as and is the logit output of model. is the attack algorithm and is the perturbed image.
represents the loss function. Besides, we always have the perturbation constrain
. represents norm and we usually use the and norm. We denote this norm ball as which is also the convex set.Being similar to the whitebox attack, the blackbox attacker’s goal is to maximize for untargeted attack and maximize the for targeted attack under the perturbation constrain where is the targeted label. We define to equal or , so the attacker’s goal is to find to minimize the objective function under the perturbation constrain. So the blackbox attack problem formulation is
(5)  
where for untargeted attack, , for targeted attack, , or .
Under the above setting, the design of the attack algorithm becomes the ZO optimization problem like in mainstreamed blackbox attack methods (Ilyas et al., 2018a; Cheng et al., 2019; Liu et al., 2018a; Tashiro et al., 2020; Guo et al., 2019b; Huang and Zhang, 2019; Zhang et al., 2020; Ilyas et al., 2018b; Zhang et al., 2020; Li et al., 2019). Random gradient free method (Nesterov and Spokoiny, 2017; Ghadimi and Lan, 2013; Duchi et al., 2015; Ghadimi et al., 2016) is most commonly adopted in attack methods. To estimate the gradient, the attackers utilize gaussian smoothing to obtain the gradient estimator
(6) 
where , smoothing parameter , Eq. (6) is the 1side estimator and we also have the 2side estimator. After obtaining estimated gradient, the attackers take projection gradient descent like whitebox attack (Goodfellow et al., 2014; Madry et al., 2018). The detailed basic attack algorithm is shown in Algorithm 2. So, The BlackBox attacker’s goal is to generate as quickly as possible with using fewer queries, which also means to design attack algorithms with faster convergence.
the random vector sampled from
, the batch size of , smoothing parameter, the constrain norm ball.(7) 
5 RND: Random Noise Defense for ZeroOrder Attack Methods
5.1 Random Noise Defense
In the querybased gradient estimator (e.g.., Eq. 6), the attacker adds one small random perturbation to get the objective difference between two queries. This random perturbation plays the key role to obtain a good estimation of gradient direction. Thus, if the defender can further disturb this random perturbation, the gradient estimation is expected to be misled, such that the attack efficiency will decrease. However, the defender cannot identify whether a query is normal or malicious. Thus, the random noise defense (RND) proposes to add a random noise for each query. Consequently, the feedback for one query is , with are random noises generated by the defender, and the factor controls the magnitude of random noise. Considering the task of defending querybased blackbox attacks, we have two requirements/expectations for RND, including

The output of each individual query will not be changed significantly, i.e.., ;

The estimated gradient will be perturbed, such that the iterative attack procedure will be misled to achieve lower attack efficiency. Specifically, the gradient estimator under RND becomes
(8) where are both random noises generated by the defender.
To satisfy the first requirement, the magnitude factor should not be too large. However, to satisfy the second requirement, should be large enough to significantly change the gradient estimation. We need to choose a proper to achieve a good tradeoff between this two requirements.
6 RND: Random Noise Defense for Querybased BlackBox Attack Methods
6.1 Random Noise Defense Mechanism
In this section, we first describe the defense mechanism of RND. Based on the previous introduction of querybased methods, We can find that all querybased attacks rely on a common elements: they add the small random perturbation to each query to estimate gradients (gradient estimation methods) or find the feasible attack directions (searchbased methods). So, we can introduce another randomness to disturb the attack process and slow down the convergence of attack algorithms.
RND disturbs the gradient estimation and finding attack directions by adding extra random noise to each query to the attacked model, like Eq. (9). Since RND don’t detect whether each query is malicious or not, the added noise cannot be very large, which do not affect model accuracy for normal queries. So the RND mechanism should satisfy these two conditions

Disturbing the gradient estimation from the attackers and search for the direction of descent.

Not affecting the model accuracy for normal queries.
So the gradient estimator in Algorithm 2 becomes
(9) 
where and
could also be sampled from the same standard normal distribution and
is the defense parameter. From Eq. (9), we can see random noise could interfere with the attacker’s search for the direction of descent and lead to the wrong gradient estimation.6.2 Theoretical Analysis of Random Noise Defense against ZO Attacks
In this section, we will present the theoretical analysis about the effect of RND to querybased ZO attacks, i.e.., the convergence of Algorithm 2 with (i.e.., Eq. (9)) being the gradient estimator. Throughout our analysis, the measure of adversarial perturbation is specified as norm, corresponding to . To facilitate subsequent analyses, we firstly introduce some assumptions and definitions.
Assumption 1.
is convex function w.r.t.. .
Assumption 2.
is Lipschitzcontinuous, i.e.., .
Assumption 3.
is continuous and differentiable, and is Lipschitzcontinuous, i.e.., .
Notations. We denote the sequence of random noises added by the attacker as , with being the iteration index in Algorithm 2. The sequence of random noises added by the defender is denoted as . The sequential solutions generated by Algorithm 2 are denoted as , and the benign example is used as the initial solution, i.e.., . denotes the input dimension.
Definition 1.
The GaussianSmoothing function corresponding to with is
(10) 
Due to the noise inserted by the defender, becomes the objective function for the attacker. We also define the minimum of , .
Definition 2.
Definition 3.
We denote the set of random noises added by the attacker as , with being the iteration index in Algorithm 2. The set of random noises added by the defender is denoted as . Then, we define the expectation of the perturbed objective function with RND at the iteration as follows
(12) 
where , and is generated via Eq. (7), and it depends on and , due to the gradient estimator (9). For clarity, hereafter we use to represent .
6.2.1 Analysis under General Convex Case
Here we study the case that the objective function satisfies Assumptions 4 and 5. The corresponding convergence of Algorithm 2 with is presented in Theorem 2.
Theorem 1.
Given Assumptions 4 and 5, for any in Algorithm 2 with the gradient estimator (i.e.., Eq. (9)), we have
(13) 
where , , . To minimize the upper bound of convergence error , stepsize can be chosen as constant stepsize . Then, in order to guarantee , the minimum of upper bound is set as . So, the query complexity for attackers is .
Theorem 2 tells that the defense effect of RND is closely related to the ratio . The larger leads to the higher upper bound of convergence error and the slower convergence rate, corresponding to the better defense performance of RND against querybased attacks in practice. Specifically, if , then the query complexity is equivalent to the constant . Only when , the query complexity is really improved over . It is in accordance with our intuition that the defender should insert larger random noise (i.e.., ) than that added by the attack (i.e.., ) to achieve the satisfied defense effect. However, on the other hand, the gap between and true objective will increase along with , leading to the larger influence on each individual query. Thus, RND should find a good balance between not harming the accuracy to individual normal query and the good defense performance against iterative querybased attacks.
Remark 1.
From Theorem 2, the upper bound of convergence error (i.e.., the righthand side of Eq. (15)) depends on the ratio . The larger ratio will contribute to the higher upper bound of convergence error and the slower convergence rate, corresponding to the better defense performance for querybased attacks in practices.
However, RND cannot not always guarantee the effective defense for querybased attacks, especially when the size of noise from defender is smaller than or almost same as the size of random vectors sampled by the attackers. So, the defender need to adopt the larger noise size than to each query. However, they still guarantee that normal predictions will not be affected by this extra randomness. If the random noise added by the defender is very large, the gap between and true model prediction will be increased. So RND creates a tradeoff between natural accuracy of normal query and effective defense for the malicious.
6.3 Theoretical Analysis of Random Noise Defense against Z O Attacks
We first give some assumptions we will need to use in next sections.
Assumption 4.
is convex function w.r.t.. .
Assumption 5.
is Lipschitzcontinuous, i.e.., .
Assumption 6.
is continuous and differentiable, and is Lipschitzcontinuous, i.e.., .
Then, we give some important definitions:
(14) 
For attacker, since the defender add noise to each query, the objective function for attackers becomes .
The attacker’s goal is to generate as quickly as possible, which is also to make the algorithm converge faster. In next sections, we will show the random effect from RND leads to the slower convergence corresponding to the better defense performance.
6.3.1 General Convex Case
In this subsection, based on convergence analysis of attack algorithm against RND, we will show that random noise with larger can increase the upper bound of convergence error, which could lead to the slower convergence rate and the better defense.
We assume the function satisfies the Assumption 4 and 5. Here, we use Euclidean norm in our all theoretical analysis. We denote by a random vector composed by i.i.d variables attached to each iteration of the scheme and also denote . We define , where is generated by Eq. (7) with gradient estimator (9) depending on and .
Theorem 2.
Remark 2.
From Theorem 2, the upper bound of convergence error (the righthand side of Eq. 15) depends on the ratio . So the ratio is key factor to effectiveness of RND . The larger ratio will contribute to the higher upper bound of convergence error and the slower convergence rate, corresponding to the better defense performance for querybased attacks in practices.
So, based on above analysis, RND cannot not always guarantee the effective defense for querybased attacks, especially when the size of noise from defender is smaller than or almost same as the size of random vectors sampled by the attackers. For example, when and the ratio equals to , the query complexity is not changed and RND cannot improve the defense effect for querybased attacks. The experiment results in section 7.3 also verify our theorem. So, the defender need to adopt the larger noise size than to each query. However, they still guarantee that normal predictions will not be affected by this extra randomness. If the random noise added by the defender is very large, the gap between and true model prediction will be increased. So RND creates a tradeoff between natural accuracy of normal query and effective defense for the malicious.
6.3.2 Analysis under General NonConvex Case
Here we study the convergence of a more challenging case that only satisfies Assumption 5. We firstly define
(16) 
which is the smoothing version of .
Theorem 3.
Given Assumption 5, for any in Algorithm 2 with , we have
(17)  
where , . To bound the gap between and , i.e.., , we could choose (Nesterov and Spokoiny, 2017). Due to the nonconvexity assumption, we only guarantee the convergence to a stationary point of the function , which is a smoothing approximation of . To minimize the upper bound (i.e.., the righthand side), we can choose a constant step size . The minimum of upper bound is denoted as , then the upper bound for the expected number of queries is .
6.3.3 Analysis under General NonConvex Case
In this subsection, based on convergence analysis of attack algorithm in nonconvex case, we will show that random noise with larger can increase the upper bound of square norm of gradients, which could lead to the slower convergence rate to stationary point and the better defense.
We assume the function satisfies the Assumption 5. And, we define
(18) 
which is smooth version of objective function of attackers.
Theorem 4.
Assume satisfies the Assumption 5. Let sequence be generated by the Algorithm 2 with the estimator (9). To bound the gap between and , we could choose (Nesterov and Spokoiny, 2017). Then for any we have
(19)  
where , , and . To minimize the upper bound, we can choose constant stepsize We only guarantee a convergence of the Algorithm 2 with estimator (9) to a stationary point of the function which is a smooth approximation of . Then to guarantee the expected norm of less than , the minimum of upper bound is set as , then the upper bound for the expected number of queries is .
From the Theorem 4, the upper bound of expected number of queries still depends on the ratio . Like the convex case, the larger ratio will contribute to the higher upper bound of convergence error and the slower convergence rate.
6.4 Theoretical Analysis of Random Noise Defense against Adaptive Attacks
As suggested in recent studies of robust defense (Athalye et al., 2018a; Carlini et al., 2019; Tramer et al., 2020), the defender should take a robust evaluation against the corresponding adaptive attack, in which case the attacker is aware of the defense mechanism. Here we study the defense effect of RND against adaptive blackbox attacks. Since the idea of RND is inserting random noise to each query to disturb the gradient estimation, an adaptive attacker could utilize Expectation Over Transformation (EOT) (Athalye et al., 2018b) to obtain an more accurate estimation, i.e.., querying one sample multiple times to obtain the average. Then, the gradient estimator used in Algorithm 2 becomes
(20) 
where , with . Note that here the definition of the sequential random noises added by the defender (see Section 6.2) should be updated to . The convergence analysis of Algorithm 2 with (23) against RND is presented in Theorem 6. Due to the space limit, we only present the analysis about the convex case that satisfies Assumptions 4, 5 and 6. However, the analysis can also extended to the nonconvex case, which will be shown in supplementary material.
Theorem 5.
Corollary 1.
With larger , the upper bound will be reduced. So, EOT can mitigate the defense effect due to the randomness from RND. However, with going to infinity, the upper bound of expected convergence error (i.e.., Eq. (24)) becomes
(22)  
where the max term is still determined by the ratio . It implies that the attack improvement from EOT is limited, especially with the larger ratio .
In this subsection, we discuss the corresponding adaptive attack to RND. According to recent paper about evaluating robust defense (Athalye et al., 2018a; Carlini et al., 2019; Tramer et al., 2020), the defender should take a robust evaluation against the corresponding adaptive attack (i,e. an stronger attacker aware of the detail of defense mechanism). As discussed in (Athalye et al., 2018a), RND obfuscates gradient estimated and utilized by the attackers in gradient estimation methods. RND belongs to the stochastic gradient case depend on testtime randomness by adding noise to each query. Since the attackers have known they only get noisy function values for their each query, they could query the same sample many times to obtain the average which could mitigate the randomness. To design adaptive attack, the attackers can estimate gradients of randomized defenses by applying Expectation Over Transformation (EOT) (Athalye et al., 2018b). So, the gradient estimator in Algorithm 2 becomes
(23) 
where and are random vectors sampled from standard normal distribution . Next, we give the convergence analysis of basic ZO BlackBox attack with EOT against RND.
We assume the function satisfies the Assumption 4, 5, and 6. The following analysis still applies to nonconvex functions, which will be shown in the supplementary materials.
Theorem 6.
Corollary 2.
With larger , the upper bound will be reduced. So, EOT can alleviate randomness effect coming from RND to attack algorithm convergence. However, with going to infinity, the he upper bound of expected convergence error (Eq. (24)) becomes
(25)  
Where the max term is still determined by the ratio . So, this also illustrates that the attack improvement from EOT is limited, especially with the larger ratio .
We can also conclude that the effect of EOT depends on the dimension of dataset. Given fixed from the defender, when is satisfied, which means the data samples are lowdimensional, the upper bound (the righthand side of Eq. (24)) is determined by the , rather than . So, EOT can effectively reduce random effect from random noise.
In the image classification experiments, for example, the dimension of ImageNet data and CIFAR10 data is 150,528 and 3,072, the noise adopted in experiment is 0.01 or 0.02. It can be clearly seen that ImageNet does not satisfy and CIFAR10 dataset does. The experiments results in section 7.5 validate these claims.
6.5 RND with Gaussian Augmentation Training
Aforementioned theoretical analyses in different cases tell that the defense mechanism RND should choose a proper noise magnitude to achieve a good balance between keeping the accuracy of the normal queries and the defense effectiveness to querybased blackbox attacks. To achieve a highquality balance, we could reduce the sensitivity of the target model to random noises, such that the influence of the noise to the accuracy to each individual query will be reduced. One straightforward method is Gaussian Augmentation Training (GT), which adds random noise to each training sample, as a preprocessing step in the training process. Consequently, the model trained with GT is expected to maintain good accuracy to each individual query, even though the defence RND adds a relatively large random noise to each query.
Based on the theoretical analysis of above two subsections, we know that the larger ratio can contribute to the slower convergence and better RND defense performance, but it also affects blackbox model’s prediction accuracy for normal queries. We meet a tradeoff between prediction accuracy of normal query and defense for the malicious. We should reduce the sensitivity of the blackbox model to random noise, which means even if we add larger noise to each query, the blackbox model still maintain excellent performance on normal queries.
The straightforward method is Gaussian Augmentation Training (GT) which is to add noise to each training sample in training stage. So we can increase the stability of blackbox model to noise. Therefore, even though the defender add larger noise to each query in inference time, the blackbox model still maintain good clean accuracy.
7 Experiments
In this section, we conduct various experiments on benchmark datasets to verify our theoretical results and evaluate the effectiveness of RND against mainstreamed querybased blackbox attack methods.
7.1 Experimental Settings
The evaluation setting
In this section, we verify our theoretical analysis and evaluate the defense performance of RND against mainstreamed querybased blackbox attack methods. we choose NES, ZOsignSGD (ZS), Bandit, DualPath Distillation (DPD), SimBA, SignHunter and Square attack. Square attack (Andriushchenko et al., 2020) and SignHunter (AlDujaili and O’Reilly, 2020) are stateofart querybased methods and the other methods are the most commonly compared attack methods. The first four methods belong to gradient estimation methods and the remaining three are searchbased methods. We only consider the stronger untargeted attack (because a model robust to untargeted attacks is also robust to targeted attacks). Following previous attack methods (Ilyas et al., 2018a), we evaluate all the attack methods on all test images of CIFAR10 and 1000 images randomly samples from the validation set of ImageNet. We evaluate all attack methods with both and . The perturbation budget of is for two datasets. For norm, the perturbation budget are and for CIFAR10 and ImageNet. The limited query budget is set to 10000.
Following previous blackbox attack methods, for attacked model, we use VGG16 model (Simonyan and Zisserman, 2014) and WideResNet16 model (Zagoruyko and Komodakis, 2016) for CIFAR10 dataset. We train them according to the normal training settings. Their test accuracy are and . For ImageNet dataset, we use the pretrained Inception v3 model (Szegedy et al., 2016) provided by torchvision package and the clean accuracy on 1000 images is .
The details of hyperparameter are shown in supplementary materials. According to the analysis of the above experiments, we consider two main factor of the adversaries: the smoothing parameter
and the different of EOT, to evaluate the robust defensive performance of RND. By trying the combination of and , we choose attack parameters with the best attack effect and evaluate two defense mechanism on them in section 7.5 and 7.7.The setting of comparing defense methods
We compare our methods with AT, RSE and pure GT model (Rusak et al., 2020). For adversarial training, we also adopt the WideResNet16 model as blackbox model. We set the maximum distortion of adversarial image as in scale, following the experimental protocol in (Madry et al., 2018). We run 10 iterations of PGD with constant step size of 2.0, as done in (Madry et al., 2018). For RSE, we use the pretrained VGG16 model with provided by the authors. We also train the WideNet16 by using their codes. The size of init noise is and size of inner noise is . For GT, we train the VGG16 model with adding Gaussian noise sampled from and WideNet16 with adding Gaussian noise sampled from . Their natural accuracy are and . For ImageNet dataset, we choose the finetuned ResNet50 GT model provided by (Rusak et al., 2020) with Gaussian noise sampled from with clean accuracy.
Datasets and Classification Models.
We conduct experiments on two widely used benchmark datasets in adversarial machine learning: CIFAR10
(Krizhevsky and others, 2009) and ImageNet (Deng et al., 2009). CIFAR10 includes 50k training images and 10k test images with 10 classes. ImageNet contains 1,000 classes with 1.28 million images for training and 50k images for validation. For classification models, we use VGG16 (Simonyan and Zisserman, 2014) and WideResNet16 (Zagoruyko and Komodakis, 2016) on CIFAR10. We conducted standard training and their clean accuracy on the test set is and , respectively. For ImageNet, we adopt the pretrained Inception v3 model (Szegedy et al., 2016) and ResNet50 model (He et al., 2016) provided by torchvision package and the clean accuracy are and .Blackbox Attack Methods.
We consider several mainstreamed querybased blackbox attack methods, including NES (Ilyas et al., 2018a), ZOsignSGD (ZS) (Liu et al., 2018a), Bandit (Ilyas et al., 2018b), DPD (Zhang et al., 2020), SimBA (Guo et al., 2019a), SignHunter (Andriushchenko et al., 2020) and Square (AlDujaili and O’Reilly, 2020). Note that the NES, ZS and Bandit are gradient estimation based attack methods and the other four methods are searchbased methods. SimBA is only designed for attack. For DPD, the authors only provide the code for attack on ImageNet. Following (Ilyas et al., 2018a), we evaluate all the attack methods on whole test set of CIFAR10 and 1,000 random sampled images from the validation set of ImageNet. We present the evaluation performance against the untargeted attack in this section and evaluate the performance under both and attack. The perturbation budget of is set to for both datasets. For attack, the perturbation budget is set to and
on CIFAR10 and ImageNet, respectively. The number of maximal queries is set to 10,000. We adopt the attack failure rate as evaluation metric. The higher the attack failure rate, the better the adversarial defense performance.
7.2 The Evaluation of RND against Querybased Attacks
Following the theoretical analysis of section 6.3, The key factor of RND defense against querybased attacks is the ratio that is larger then can contribute to the slower attack process and better defense performance under limited query setting. We first evaluate the RND defense performance against several attack methods with different ratios. To guarantee the good natural accuracy, we choose . For smoothing parameter , we choose it from . The natural accuracy drop caused by RND in CIFAR10 is not significant at , and in VGG16 and WideNet16. They becomes large at , and respectively. So, sufficiently small is important for maintaining clean accuracy for natural model. The inception v3 is better with natural accuracy is , , and