1 Introduction
Adversarial attack has been a wellrecognized threat to existing deep neural network based applications. It injects small amount of noise to a sample (e.g., image, speech, language) but degrades the model performance drastically
[3, 11, 15]. According to the information that an adversary has of the target network, existing attack falls into two categories: whitebox attack that knows all the parameters of the target network, and blackbox attack that only has access to the output of the target network. However, it’s sometimes difficult or even impossible to have full access to certain networks, which makes the blackbox attack practical and attract more and more attention.Blackbox attack has very limited or no information of the target network and thus is more challenging to perform. In the bounded setting, a blackbox attack is usually evaluated on two aspects: number of queries and success rate. In addition, recent work [10] shows that visual distortion in the adversarial examples is also an important criteria in practice. Even under a small bound, perturbing pixels in the image without considering the visual impact could make the distorted image very annoying. As shown in Fig. 1, an attack [9] under a small noise level () causes relatively large visual distortion and the perturbed image is more distinguishable from the original one. Therefore, under the assumption that the visual distortion caused by the noise is related to the spatial distribution of the perturbed pixels in a bounded attack, we take a different view from previous work and focus on explicitly learning a noise distribution based on its corresponding visual distortion.
In this paper, we propose a novel blackbox attack that can directly minimize the induced visual distortion by learning the noise distribution of the adversarial example, assuming only lossoracle access to the blackbox network. The quantified visual distortion, which measures the perceptual distance between the adversarial example and the original image, is introduced in our loss where the gradient of the corresponding nondifferentiable loss function is approximated by sampling noise from the learned noise distribution. The proposed attack can achieve a tradeoff between visual distortion and query efficiency by introducing the weighted perceptual distance metric in addition to the original loss. Theoretically, we prove the convergence of our model under the assumption that the loss function is convex. The experiments demonstrate the effectiveness of our attack on ImageNet. Our attack results in much lower distortion than the other attacks and achieves success rate on ResNet50 and VGG16bn. In addition, it is shown that our attack is valid even when it’s only allowed to perturb pixels that are out of the target object in a given image.
2 Related Work
Although adversarial attack poses a big threat to existing networks, performing attacks can evaluate the robustness of a network, and further helps improve its robustness by augmenting adversarial examples in training [24]. Recent research on the adversarial attack has made advanced progress in developing a stronger and more computationally efficient adversary. Since our method is based on blackbox attack, we briefly introduce recent attack techniques in the blackbox setting.
Blackbox attack considers the target network as a blackbox, and only assumes access to its output scores. Existing methods for the blackbox attack roughly fall into three categories: 1) Methods that estimate gradient of the blackbox. Some methods estimate the gradient by sampling around a certain point, which formulates the task as a problem of continuous optimization. Tu
et al. [25] searched for perturbations in the latent space of an autoencoder. Ilyas et al. [9] exploited prior information about the gradient. AlDujaili and O‘Reilly [1] reduced query complexity by estimating just the sign of the gradient. [12] shares similarity with our method as it also explicitly defines a noise distribution. However, the distribution in [12]is assumed to be an isometric normal distribution without considering visual distortion whilst our method does not assume the distribution to be a specific form and learns a noise distribution that causes less visual distortion. Other approaches developed a substitute model
[15, 4, 16] to approximate performance of the blackbox. By exploiting the transferability of adversarial attack [6], the whitebox attack technique applied to the substitute model can be transferred to the blackbox. These approaches assume only labeloracle to the blackbox, whereas training of the substitute model requires either access to the blackbox training dataset or collection of a new dataset. 2) Methods based on discrete optimization. In [14, 1], an image is divided into regular grids and the attack is performed and refined on each grid. Meunier et al. [13] adopted the tiling trick by adding the same noise for small square tiles in the image. 3) Methods that leverage evolutionary strategies or random search [13, 2]. In [2], the noise value is updated using a squareshaped random search at each query. Meunier et al. [13]developed a set of attacks using evolutionary algorithms using both continuous and discrete optimization.
Previous methods did not consider the visual impact of the induced noise, for which the adversarial example could suffer from significant visual distortion. This motivates us to consider the visual quality degradation in the attack model. Under the assumption that the visual distortion caused by the noise is related to the spatial distribution of the perturbed pixels in a bounded attack, we explicitly define a noise distribution, which is learned to minimize the visual distortion.
3 Method
3.1 Learning Noise Distribution Based on Visual Distortion
An attack model is an adversary that constructs adversarial examples against certain networks. Let be the target network that accepts an input and produces an output .
is a vector and
represents its entry, denoting the score of the class. is the predicted class. Given a valid input and the corresponding predicted class , an adversarial example [22] is similar to yet results in an incorrect prediction . In an additive attack, an adversarial example is a perturbed input with additive noise such that , where is bounded by an ball. Although there are several choices of (), we discuss in this paper since our method defines a sample space with a fixed range for each pixel independently. As for other values, please refer to section 4.5 for further discussions. The problem of generating an adversarial example is equivalent to produce noise that causes wrong prediction for the perturbed input. Thus a successful attack is to find such that (1) and (2) . Since the constraint (1) is highly nonlinear, the problem is usually rephrased in a different form [3]:(1) 
where is the loss function, which is defined as . The attack is successful when . It’s noted that such a loss does not take the visual impact into consideration, for which the adversarial example could suffer from significant visual distortion. In order to constrain the visual distortion caused by the difference between and , we adopt a perceptual distance metric
into the loss function with a predefined hyperparameter
:(2)  
where smaller indicates less visual distortion. can be any form of metric that measures the perceptual distance between and , such as well established [27] or LPIPS [26]. manages the tradeoff between a successful attack and the visual distortion caused by the attack. The effects of will be further discussed in Section 4.1.
Minimizing the above loss function facing a challenge that is not differentiable since the blackbox adversary does not have access to the gradients of and the predefined might be calculated in a nondiffrentiable way. To address this problem, we explicitly assume a noise distribution of and approximate the gradient of by sampling from the distribution. Suppose that follows a distribution parameterized by , i.e., . For the pixel in an image, we make its noise distribution , where is the component of . The noise value of the pixel is sampled by following . By sampling noise from the distribution, can be learned to minimize the expectation of the above loss such that the attack is successful (i.e., alters the predicted label) and the produced adversarial example is less distorted (i.e., small ). The expectation is minimized by sampling from for each pixel:
(3) 
To ensure the constraint is satisfied, we define the sample space of noise for the pixel to be a set of discrete values in the range of and : , where is the sampling frequency and is the sampling interval. The noise value of the pixel is sampled from this sample space by following .
Given and the width and height of an image, respectively, since each pixel has its own noise distribution of length , the length of for the entire image is . Note that we do not consider the difference of color channels. Thus, the same noise value is sampled for each color channel of a pixel. To estimate , we adopt policy gradient [20] to make the above expectation differentiable with respect to . Using REINFORCE, we have the differentiable loss function :
(4)  
(5)  
where is introduced as a baseline in the expectation with specific meaning: 1) when , the sampled returns low
, and its probability
increases through gradient descent; 2) when , and remains unchanged; 3) when , the sampled returns high , and its probability decreases through gradient descent. To sum up, is forced to improve over . At the iteration , we choose such that improves over the obtained minimal loss.The above expectation is estimated using a single Monte Carlo sampling at each iteration, and the sampling of noise is critical. Simply sampling at the iteration
on the entire image might cause large variance on the norm of the noise,
i.e., . Therefore, to ensure a small variance, with , only a small proportion of the noise is randomly resampled from iteration while the others remain unchanged. Let be the proportion of the resampled noise at each iteration, the updated at an iteration is(6) 
where denotes randomly sampling proportion of the noise from . As shown in Fig. 2, at the iteration , proportion of noise is resampled by following the corresponding distribution . Then, the feedback from the blackbox and the perceptual distance metric decide the update of the distribution . The iteration stops when the attack is successful, i.e., .
3.2 Proof of Convergence
Ruan et al. [17] shows that feedforward DNNs (Deep Neural Networks) are Lipschitz continuous with a Lipschitz constant . Therefore, we have
(7) 
Let and , where , we have
(8) 
At an iteration , since only a small proportion of the noise is randomly resampled from iteration , it can be assumed that
(9) 
where is a constant. Note that the learning stops when the attack is successful, i.e., . Therefore, until the learning stops. Suppose that the perceptual distance metric is normalized to . Substituting the inequalities (8) and (9) in our definition of in Eq. (2) gets the following inequality:
(10)  
Note that is bounded by . Given width , height , channel of the image, and the resampled proportion of the noise from iteration , we have
(11) 
Thus, the inequality (10) becomes
(12) 
Ideally, accurately quantifies the difference of the perturbed image even when only one noise value for just a single pixel at the iteration is sampled differently from that at . Let represent the perturbed image with the noise value of the pixel being sampled. Note that is a vector of length , denoting that there are noise values that could be sampled for each pixel. Similarly, denotes the probability of the noise value of the pixel being sampled. By sampling every noise value for the pixel, we define and to be a vector:
(13) 
(14) 
Although the above equations are only meaningful under the ideal situation where can quantify the difference of just one perturbed pixel, we use these equations for a theoretical proof of convergence. In the ideal situation, instead of using a single Monte Carlo sampling to estimate as in Eq. (5), the component of can be calculated exactly as
(15) 
where is the component of . According to Eq. (12) when the number of the resampled pixels =1, we have
(16) 
Note that for that share the same , is equal to . Thus, replacing the inequality (18) in Eq. (17) gets
(17) 
In practice, we adopt a single Monte Carlo sampling instead of sampling every noise values for every pixel, for which should be replaced by in the above inequality. The inequality (17) thus becomes:
(18)  
Since the standard softmax function is Lipschitz continuous with the Lipschitz constant being 1 [5]. We have
(19) 
Finally, the inequality for becomes
(20) 
The above inequality proves that is smooth with the Lipschitz constant being . Assuming that is convex, according to the convergence theorem for gradient descent [23], it follows that
(21) 
where is the optimal solution. When is large enough, approximates up to a small enough epsilon and the learning converges.
Sampling  Perceptual  Success  LPIPS  Avg.  
Frequency  Metric  Rate  Queries  
  0  100%  0.091  0.099  356  
10  100%  0.076  0.081  401  
100  97.4%  0.036  0.051  1395  
200  92.2%  0.025  0.040  2534  
LPIPS  10  100%  0.080  0.078  450  
100  98.1%  0.049  0.052  1174  
200  95.1%  0.038  0.045  1928  
10  99.7%  0.071  0.074  520  
10  99.5%  0.069  0.070  665  
10  98.7%  0.062  0.075  669  
10  98.7%  0.071  0.075  673 
4 Experiments
Following previous work [13, 9], we validate the effectiveness of our model on the largescale ImageNet [18]
dataset. We use three pretrained classification networks on Pytorch as the blackbox networks: InceptionV3
[21], ResNet50 [7] and VGG16bn [19]. The attack is performed on images that were correctly classified by the pretrained network. We randomly select
images in the validation set for test, and all images are normalized to . We quantify our success in terms of the perceptual distance ( and LPIPS) as we address the visual distortion caused by the attack. In these two metrics, [27] measures the degradation of structural information in the adversarial examples. Smaller indicates closer perceptual distance. LPIPS [26]evaluates the perceptual similarity of two images with their normalized distance between their deep features. Smaller value of LPIPS denotes less visual distortion. Except for
and LPIPS, the success rate and average number of queries are also reported as in most frameworks. The average number of queries refers to the average number of requests to the output of the blackbox network.We initialize the noise distribution
to be a uniform distribution and noise
to be . The learning rate is and is set to be . In addition, we specify the shape of the resampled noise at each iteration to be a square [13, 14, 2], and adopt the tiling trick [9, 13] with tile size. The upper bound of our attack is set to be as in previous work.4.1 Ablation Studies
In the ablation studies, the maximum number of queries is set to be . The results are averaged on test images. In the following, we discuss the tradeoff between visual distortion and query efficiency, the effects of using different perceptual distance metrics in the loss function and the results on different sampling frequencies.
Tradeoff between visual distortion and query efficiency.
Under the same ball, a queryefficient way to produce an adversarial example is to perturb most pixels with the maximum noise values [14, 2]. However, such attack introduces large visual distortion, which could make the distorted image very annoying. To constrain the visual distortion, the perturbed pixels should be those who cause smaller visual difference while performing a valid attack, which takes extra queries to find. This brings the tradeoff between visual distortion and query efficiency. Different from previous work, this tradeoff can be controlled by in our loss function. As shown in Table 1, when and , the adversary does not consider visual distortion at all, and perturbs each pixel that is helpful for misclassification until the attack is successful. Thus, it causes the largest perceptual distance ( and ) with the least number of queries (). As increases to , both and LPIPS decrease at the cost of more queries and lower success rate. The maximum in Table 1 is since further increasing it causes the success rate to be lower than . Fig. 3 gives several visualized examples on different , where adversarial examples with larger suffer from less visual distortion.
Ablation studies on the perceptual distance metric.
The perceptual distance metric in the loss function is predefined to measure the visual distortion between the adversarial example and the original image. We adopt and LPIPS as the perceptual distance metric to optimize, respectively, and report their results in Table 1. When , optimizing shows better score on ( v.s. ) whilst optimizing LPIPS has better performance on LPIPS ( v.s. ). However, when increases to and , optimizing gives better scores on both and LPIPS. Therefore, we set the perceptual distance metric to be in the following experiments.
Sampling frequency.
Sampling frequency decides the size of the sample space of . Setting higher frequency means there are more noise values to explore through sampling. In Table 1, increasing the sampling frequency from to reduces the perceptual distance to some extent at the cost of lower success rate. On the other hand, further increasing to does not essentially reduce the distortion yet lowers the success rate. To ensure a high success rate of attack, we set the sampling frequency in the following experiments. Note that the maximum sampling frequency is because the sampling interval in RGB color space (i.e., ) would be less than if . See Fig. 4 for a few adversarial examples.
Attacked  Success  LPIPS  Avg.  
Range  Rate  Queries  
I  R  V  I  R  V  I  R  V  I  R  V  
Image  100%  100%  100%  0.078  0.076  0.072  0.096  0.081  0.079  845  401  251 
Outofobject  90.1%  93.8%  94.7%  0.071  0.069  0.074  0.081  0.065  0.070  4275  3775  3104 
4.2 OutofObject Attack
Most existing classification networks [7, 8]
are based on CNN (Convolutional Neural Network), which gradually aggregates contextual information in deeper layers. Therefore, it is possible to fool the classifier by just attacking the “context”,
i.e., background that is out of the target object. Attacking just the outofobject pixels constrains the number and the position of pixels that can be perturbed, which might further reduce the visual distortion caused by the noise. To locate the object in a given image, we exploited the object bounding box provided by ImageNet. An outofobject mask is then created according to the bounding box such that the model is only allowed to attack pixels that are out of the object, as shown in Fig. 5. In Table 2, we report results of InceptionV3, ResNet50 and VGG16bn with the maximum queries. The attack is performed on images whose masks are at least large of the image area. The results show that attacking just the outofobject pixels can also cause misclassification of the object with over success rate. Compared with image attack, the outofobject attack is more difficult for the adversary in that it requires more number of queries () yet has lower success rate (). On the other hand, the outofobject attack indeed reduces visual distortion of the adversarial examples on the three networks.4.3 Attack Effectiveness on Defended Network
Network  Clean Accuracy  After Attack  LPIPS  Avg. Queries  

v  75.8%  0.8%  0.096  0.149  531 
v  73.4%  1.8%  0.103  0.154  777 
Attack  Success  LPIPS  Avg.  
Rate  Queries  
I  R  V  I  R  V  I  R  V  I  R  V  
SignHunter [1]  98.4%      0.157      0.117      450     
NAttack [12]  99.5%      0.133      0.212      524    
Bandits [9]  96.5%  98.8%  98.2%  0.343  0.307  0.282  0.201  0.157  0.140  935  705  388 
Square Attack [2]  99.7%  100%  100%  0.280  0.279  0.299  0.265  0.243  0.247  237  62  30 
SignHunterSSIM  97.6%      0.220      0.157      642     
NAttackSSIM  97.3%      0.128      0.210      666     
BanditsSSIM  80.0%  89.3%  89.7%  0.333  0.303  0.275  0.200  0.163  0.135  1318  1020  793 
Square AttackSSIM  99.2%  100%  100%  0.260  0.268  0.292  0.256  0.238  0.245  278  65  30 
Ours  98.7%  100%  100%  0.075  0.076  0.072  0.094  0.081  0.079  731  401  251 
In the above experiments, we show that our blackbox model can attack the undefended network with high success rate. To evaluate the strength of the proposed attack in defended situation, we further attack the InceptionV3 network that adopts ensemble adversarial training (i.e., v). Following [24], we set and randomly select images from the ImageNet validation set for test. The maximum number of queries is . The performance of the attacked network is reported in Table3, where clean accuracy is the classification accuracy before attack. Note that v is slightly different from InceptionV3 in Table 1 in that the pretrained model of v
comes from Tensorflow, which is the same platform of the pretrained model of v
. Compared with undefended network, attacking defended one causes larger visual distortion. However, the proposed attack can still reduce the classification accuracy from to , which demonstrates its effectiveness under defend.4.4 Comparison with Other Attacks
Different from previous work which focuses on query efficiency, our model addresses improving the visual similarity between the adversarial example and the original image. Therefore, the proposed method might cost more number of queries to construct a less distorted adversarial example. To show that such costs are affordable, we compare our attack to recently proposed queryefficient blackbox attacks: SignHunter
[1], NAttack [12], Bandits [9] and Square Attack [2]. Since these attacks do not consider visual distortion, for fair comparison, we add in their objective functions accordingly with as in our method, which are represented by SSIM In Table 4.Distance Metric  Sampling Frequency  Success Rate  LPIPS  Avg. Queries  

1  99.5%  0.077  0.083  0.133  0.130  6.75  536  
2  99.2%  0.065  0.069  0.159  0.118  5.88  679  
5  97.9%  0.058  0.065  0.177  0.118  5.19  960  
1  99.5%  0.077  0.083  0.133  0.130  6.75  536  
2  99.5%  0.070  0.076  0.176  0.130  6.14  658  
5  99.2%  0.066  0.070  0.218  0.129  5.74  800  
1  99.5%  0.110  0.112  0.215  0.211  8.21  392  
2  99.5%  0.092  0.100  0.259  0.191  7.44  431  
5  99.5%  0.087  0.094  0.312  0.185  6.89  579 
The results of the above methods are reproduced using the official codes provided by the authors. In NAttack, we set the sample size to be since the original large sample size in the paper is computationally expensive. The maximum number of queries is as in previous work. In our model, considering the tradeoff between visual distortion and query efficiency, we set and the perceptual distance metric to be . In Table 4, the proposed attack reduces and LPIPS approximately by half while remaining a high success rate () within limited number of iterations. Except for Signhunter, introducing in the objective function helps reduce visual distortion in other attacks. However, our method still outperforms these attacks since the perceptual distance metric is directly minimized in our method. In addition, the number of queries of our attack is comparable to that of Bandits. Note that the success rates have a sharp decrease in BanditsSSIM compared with Bandits. This is because Bandits attack uses estimated gradient of the blackbox classifier as its prior, whereas simply adding in the loss causes inaccurate gradient. The visualized adversarial examples from different attacks are given in Fig. 6, which shows that our model produces less distorted adversarial examples. More examples can be found in Fig. 7.
We noticed that SignHunter produces adversarial examples with horizontalstripped noise and Square Attack generates adversarial examples with verticalstripped noise. Stripped noise is helpful in improving query efficiency since the classification network is quite sensitive to such noise [2]. However, from the perspective of visual distortion, such noise greatly degrades the image quality. The adversarial examples of Bandits are relatively perceptiblefriendly, but the perturbation affects most pixels in the image, which causes visually “noisy” effects, especially in a monocolor background. The noise produced by Nattack appear to be regular color patches all over the image due to its large tiling size in the method.
4.5 Other Attacks
Although our method in this paper is based on attack, other () distance can be regarded as the perceptual distance metric in the loss function, which is minimized with a tradeoff parameter . We did not discuss it in the experiments because these distance metrics are less accurate in measuring the perceptual distance between images compared to the specifically designed metrics, such as wellestablished and LPIPS. In Table 5, the results of other () attacks are shown, where the distance is normalized to as the perceptual distance metric in the loss function. Specifically, , where is the distance between the original image and the perturbed image . As in the paper, we set and the maximum number of queries being . We find that the raw and scores have much higher order of magnitude compared with other metrics, and thus the normalized scores of and distances are reported in Table 5. Note that when the sampling frequency , distance is equivalent to distance in that
(22)  
where is the number of perturbed pixels. and are the width, height and number of channels of a given image, respectively. Table 5 shows that optimizing distance gives better performance on both the perceptual distance metrics and the distance metrics.
4.6 Conclusion
We introduce a novel blackbox attack based on the induced visual distortion in the adversarial example. The quantified visual distortion, which measures the perceptual distance between the adversarial example and the original image, is introduced in our loss where the gradient of the corresponding nondifferentiable loss function is approximated by sampling from a learned noise distribution. The proposed attack can achieve a tradeoff between visual distortion and query efficiency by introducing the weighted perceptual distance metric in addition to the original loss. The experiments demonstrate the effectiveness of our attack on ImageNet as our model achieves much lower distortion when compared to existing attacks. In addition, it is shown that our attack is valid even when it’s only allowed to perturb pixels that are out of the target object in a given image.
References
 [1] (2020) Sign bits are all you need for blackbox attacks. In Proc. International Conference on Learning Representations, Cited by: §2, Figure 6, §4.4, Table 4.
 [2] (2019) Square attack: a queryefficient blackbox adversarial attack via random search. arXiv preprint arXiv:1912.00049. Cited by: §2, Figure 6, §4.1, §4.4, §4.4, Table 4, §4.
 [3] (2017) Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, Cited by: §1, §3.1.
 [4] (2019) Improving blackbox adversarial attacks with a transferbased prior. In Proc. International Conference on Neural Information Processing Systems, Cited by: §2.

[5]
(2017)
On the properties of the softmax function with application in game theory and reinforcement learning
. arXiv preprint arXiv:1704.00805. Cited by: §3.2.  [6] (2015) Explaining and harnessing adversarial examples. In Proc. International Conference on Learning Representations, Cited by: §2.

[7]
(2016)
Deep residual learning for image recognition.
In
Proc. IEEE Conference on Computer Vision and Pattern Recognition
, Cited by: §4.2, §4.  [8] (2018) Squeezeandexcitation networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.2.
 [9] (2019) Prior convictions: blackbox adversarial attacks with bandits and priors. In Proc. International Conference on Learning Representations, Cited by: Figure 1, §1, §2, Figure 6, §4.4, Table 4, §4, §4.
 [10] (2019) Quantifying perceptual distortion of adversarial examples. arXiv preprint arXiv:1902.08265. Cited by: §1.
 [11] (2017) Adversarial machine learning at scale. In Proc. International Conference on Learning Representations, Cited by: §1.
 [12] (2019) NATTACK: learning the distributions of adversarial examples for an improved blackbox attack on deep neural networks. In Proc. International Conference on Machine Learning, Cited by: §2, Figure 6, §4.4, Table 4.
 [13] (2019) Yet another but more efficient blackbox adversarial attack: tiling and evolution strategies. arXiv preprint arXiv:1910.02244. Cited by: §2, §4, §4.

[14]
(2019)
Parsimonious blackbox adversarial attacks via efficient combinatorial optimization
. In Proc. International Conference on Machine Learning, Cited by: §2, §4.1, §4.  [15] (2017) Practical blackbox attacks against machine learning. In Proc. ACM on Asia Conference on Computer and Communications Security, Cited by: §1, §2.
 [16] (2016) Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. arXiv preprint arXiv:1605.07277. Cited by: §2.

[17]
(2018)
Reachability analysis of deep neural networks with provable guarantees.
In
Proc. International Joint Conference on Artificial Intelligence
, Cited by: §3.2.  [18] (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115 (3), pp. 211–252. Cited by: §4.
 [19] (2015) Very deep convolutional networks for largescale image recognition. In Proc. International Conference on Learning Representations, Cited by: §4.
 [20] (1998) Reinforcement learning: an introduction. MIT press Cambridge. Cited by: §3.1.
 [21] (2016) Rethinking the inception architecture for computer vision. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.
 [22] (2014) Intriguing properties of neural networks. In Proc. International Conference on Learning Representations, Cited by: §3.1.
 [23] (2013) Gradient descent: convergence analysis. Note: https://www.stat.cmu.edu/~ryantibs/convexoptF13/scribes/lec6.pdf Cited by: §3.2.
 [24] (2018) Ensemble adversarial training: attacks and defenses. In Proc. International Conference on Learning Representations, Cited by: §2, §4.3.

[25]
(2019)
AutoZOOM: autoencoderbased zeroth order optimization method for attacking blackbox neural networks
. In Proc. AAAI Conference on Artificial Intelligence, Cited by: §2.  [26] (2018) The unreasonable effectiveness of deep features as a perceptual metric. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §3.1, §4.
 [27] (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §3.1, §4.
Comments
There are no comments yet.