Robustness vs Accuracy Survey on ImageNet
The prediction accuracy has been the long-lasting and sole standard for comparing the performance of different image classification models, including the ImageNet competition. However, recent studies have highlighted the lack of robustness in well-trained deep neural networks to adversarial examples. Visually imperceptible perturbations to natural images can easily be crafted and mislead the image classifiers towards misclassification. To demystify the trade-offs between robustness and accuracy, in this paper we thoroughly benchmark 18 ImageNet models using multiple robustness metrics, including the distortion, success rate and transferability of adversarial examples between 306 pairs of models. Our extensive experimental results reveal several new insights: (1) linear scaling law - the empirical ℓ_2 and ℓ_∞ distortion metrics scale linearly with the logarithm of classification error; (2) model architecture is a more critical factor to robustness than model size, and the disclosed accuracy-robustness Pareto frontier can be used as an evaluation criterion for ImageNet model designers; (3) for a similar network architecture, increasing network depth slightly improves robustness in ℓ_∞ distortion; (4) there exist models (in VGG family) that exhibit high adversarial transferability, while most adversarial examples crafted from one model can only be transferred within the same family. Experiment code is publicly available at <https://github.com/huanzhang12/Adversarial_Survey>.READ FULL TEXT VIEW PDF
Recent studies have highlighted the vulnerability of deep neural network...
ConvNets and Imagenet have driven the recent success of deep learning fo...
The learning of hierarchical representations for image classification ha...
A rapidly growing area of work has studied the existence of adversarial
Recently, it was found that many real-world examples without intentional...
Deep learning algorithms have increasingly been shown to lack robustness...
As audio/visual classification models are widely deployed for sensitive ...
Robustness vs Accuracy Survey on ImageNet
Codes for reproducing robustness-accuracy analysis in "Is Robustness the Cost of Accuracy? -- A Comprehensive Study on the Robustness of 18 Deep Image Classification Models", ECCV 2018
Image classification is a fundamental problem in computer vision and serves as the foundation of multiple tasks such as object detection, image segmentation, object tracking, action recognition, and autonomous driving. Since the breakthrough achieved by AlexNet in ImageNet Challenge (ILSVRC) 2012 , deep neural networks (DNNs) have become the dominant force in this domain. From then on, DNN models with increasing depth and more complex building blocks have been proposed. While these models continue to achieve steadily increasing accuracies, their robustness has not been thoroughly studied, thus little is known if the high accuracies come at the price of reduced robustness.
A common approach to evaluate the robustness of DNNs is via adversarial attacks [3, 4, 5, 6, 7, 8, 9, 10, 11], where imperceptible adversarial examples are crafted to mislead DNNs. Generally speaking, the easier an adversarial example can be generated, the less robust the DNN is. Adversarial examples may lead to significant property damage or loss of life. For example, 
has shown that a subtly-modified physical Stop sign can be misidentified by a real-time object recognition system as a Speed Limit sign. In addition to adversarial attacks, neural network robustness can also be estimated in an attack-agnostic manner. For example, and  theoretically analyzed the robustness of some simple neural networks by estimating their global and local Lipschitz constants, respectively.  proposes to use extreme value theory to estimate a lower bound of the minimum adversarial distortion, and can be efficiently applied to any neural network classifier. 
proposes a robustness lower bound based on linear approximations of ReLU activations. In this work, we evaluate DNN robustness by using specific attacks as well as attack-agnostic approaches. We also note that the adversarial robustness studied in this paper is different from, where “robustness” is studied in the context of label semantics and accuracy.
Since the last ImageNet challenge has ended in 2017, we are now at the beginning of post-ImageNet era. In this work, we revisit 18 DNN models submitted to the ImageNet Challenge or achieved state-of-the-art performance. These models have different sizes, classification performance, and belong to multiple architecture families such as AlexNet , VGG Nets , Inception Nets , ResNets , DenseNets , MobileNets , and NASNets . Therefore, they are suitable to analyze how different factors influence the model robustness. Specifically, we aim to examine the following questions in this study:
Has robustness been sacrificed for the increased classification performance?
Which factors influence the robustness of DNNs?
In the course of evaluation, we have gained a number of insights and we summarize our contributions as follows:
Tested on a large number of well-trained deep image classifiers, we find that robustness is scarified when solely pursuing a higher classification performance. Indeed, Figure 2(a) and Figure 2(b) clearly show that the and adversarial distortions scale almost linearly with the logarithm of model classification errors. Therefore, the classifiers with very low test errors are highly vulnerable to adversarial attacks. We advocate that ImageNet network designers should evaluate model robustness via our disclosed accuracy-robustness Pareto frontier.
The networks of a same family, e.g., VGG, Inception Nets, ResNets, and DenseNets, share similar robustness properties. This suggests that network architecture has a larger impact on robustness than model size. Besides, we also observe that the robustness slightly improves when ResNets, Inception Nets, and DenseNets become deeper.
The adversarial examples generated by the VGG family can transfer very well to all the other 17 models, while most adversarial examples of other models can only transfer within the same model family. Interestingly, this finding provides us an opportunity to reverse-engineer the architecture of black-box models.
We present the first comprehensive study that compares the robustness of 18 popular and state-of-the-art ImageNet models, offering a complete picture of the accuracy v.s. robustness trade-off. In terms of transferability of adversarial examples, we conduct thorough experiments on each pair of the 18 ImageNet networks (306 pairs in total), which is the largest scale to date.
In this section, we introduce the background knowledge and how we set up experiments. We study both untargeted attack and targeted attack in this paper. Let denote the original image and denote the adversarial image of . The DNN model
outputs a class label (or a probability distribution of class labels) as the prediction. Without loss of generality, we assume that, which is the ground truth label of , to avoid trivial solution. For untargeted attack, the adversarial image is crafted in a way that is close to but . For targeted attack, a target class () is provided and the adversarial image should satisfy that (i) is close to , and (ii) .
In this work, we study the robustness of 18 deep image classification models belonging to 7 architecture families, as summarized below. Their basic properties of these models are given in Table 1.
VGG Nets The overall architecture of VGG nets  are similar to AlexNet, but they are much deeper with more convolutional layers. Another main difference between VGG nets and AlexNet is that all the convolutional layers of VGG nets use a small (33) kernel while the first two layers of AlexNet use 1111 and 55 kernels, respectively. In our paper, we study VGG networks with and layers, with 138 million and 144 million parameters, respectively.
The family of Inception nets utilizes the inception modules that act as multi-level feature extractors. Specifically, each inception module consists of multiple branches of , , and filters, whose outputs will stack along the channel dimension and be fed into the next layer in the network. In this paper, we study the performance of all popular networks in this family, including Inception-v1 (GoogLeNet) , Inception-v2 , Inception-v3 , Inception-v4, and Inception-ResNet . All these models are much deeper than AlexNet/VGG but have significantly fewer parameters.
To solve the vanishing gradient problem for training very deep neural networks, the authors of proposes ResNets, where each layer learns the residual functions with reference to the input by adding skip-layer paths, or “identity shortcut connections”. This architecture enables practitioners to train very deep neural networks to outperform shallow models. In our study, we evaluate 3 ResNets with different depths.
DenseNets To further exploit the “identity shortcut connections” techniques from ResNets,  proposes DenseNets that connect all layers with each other within a dense block. Besides tackling gradient vanishing problem, the authors also claimed other advantages such as encouraging feature reuse and reducing the number of parameters in the model. We study 3 DenseNets with different depths and widths.
MobileNets MobileNets  are a family of light weight and efficient neural networks designed for mobile and embedded systems with restricted computational resources. The core components of MobileNets are depthwise separable filters with factorized convolutions. Separable filters can factorize a standard convolution into two parts, a depthwise convolution and a pointwise convolution, which can reduce computation and model size dramatically. In this study, we include 3 MobileNets with different depths and width multipliers.
|Models||Year||# layers||# parameters||Top-1/5 ImageNet accuracies|
|AlexNet ||2012||8||60 million||56.9% / 80.1% 111https://github.com/BVLC/caffe/wiki/Models-accuracy-on-ImageNet-2012-val|
|VGG 16 ||2014||16||138 million||71.5% / 89.8%|
|VGG 19 ||2014||19||144 million||71.1% / 89.8%|
|Inception-v1 ||2014||22||6.7 million||69.8% / 89.6%|
|Inception-v2 ||2015||48||11.3 million||73.9% / 91.8%|
|Inception-v3 ||2015||48||23.9 million||78.0% / 93.9%|
|Inception-v4 ||2016||76||42.9 million||80.2% / 95.2%|
|Inception-ResNet-v2 ||2016||96||56.1 million||80.4% / 95.3%|
|ResNet-v2-50 ||2016||50||25.7 million||75.6% / 92.8%|
|ResNet-v2-101 ||2016||101||44.8 million||77.0% / 93.7%|
|ResNet-v2-152 ||2016||152||60.6 million||77.8% / 94.1%|
|DenseNet-121-k32 ||2017||121||8.2 million||74.9% / 92.2 % 222https://github.com/pudae/tensorflow-densenet|
|DenseNet-169-k32 ||2017||169||14.4 million||76.1% / 93.1 % 2|
|DenseNet-161-k48 ||2017||161||29.0 million||77.6% / 93.8 % 2|
|MobileNet-0.25-128 ||2017||128||0.5 million||41.5% / 66.3%|
|MobileNet-0.50-160 ||2017||160||1.4 million||59.1% / 81.9%|
|MobileNet-1.0-224 ||2017||224||4.3 million||70.9% / 89.9% |
|NASNet ||2017||-||88.9 million||82.7% / 96.2%|
We use both adversarial attacks and attack-agnostic approaches to evaluate network robustness. We first generate adversarial examples of each network using multiple state-of-the-art attack algorithms, and then analyze the attack success rates and the distortions of adversarial images. In this experiment, we assume to have full access to the targeted DNNs, known as the white-box attack. To further study the transferability of the adversarial images generated by each network, we consider all the 306 network pairs and for each pair, we conduct transfer attack that uses one model’s adversarial examples to attack the other model. Since transfer attack is widely used in the black-box setting [31, 32, 33, 34, 35, 36], where an adversary has no access to the explicit knowledge of the target models, this experiment can provide some evidence on networks’ black-box robustness. Finally, we compute CLEVER  score, a state-of-the-art attack-agnostic network robustness metric, to estimate each network’s intrinsic robustness. Below, we briefly introduce all the evaluation approaches used in our study.
We evaluate the robustness of DNNs using the following adversarial attacks:
Fast Gradient Sign Method (FGSM) FGSM  is one of the pioneering and most efficient attacking algorithms. It only needs to compute the gradient once to generate an adversarial example :
where is the sign of the gradient of the training loss with respect to , and ensures that stays within the range of pixel values. It is efficient for generating adversarial examples as it is just an one-step attack.
Iterative FGSM (I-FGSM) Albeit efficient, FGSM suffers from a relatively low attack success rate. To this end,  proposes iterative FGSM to enhance its performance. It applies FGSM multiple times with a finer distortion, and is able to fool the network in more than cases. When we run I-FGSM for iterations, we set the per-iteration perturbation to . I-FGSM can be viewed as a projected gradient descent (PGD) method inside an ball , and it usually finds adversarial examples with small distortions.
C&W attack  formulates the problem of generating adversarial examples as the following optimization problem
is a loss function to measure the distance between the prediction ofand the target label . In this work, we choose
as it was shown to be effective by .
denotes the vector representation ofat the logit layer, is a confidence level and a larger generally improves transferability of adversarial examples.
C&W attack is by far one of the strongest attacks that finds adversarial examples with small perturbations. It can achieve almost attack success rate and has bypassed different adversary detection methods .
EAD-L1 attack EAD-L1 attack  refers to the Elastic-Net Attacks to DNNs, which is a more general formulation than C&W attack. It proposes to use elastic-net regularization, a linear combination of and norms, to penalize large distortion between the original and adversarial examples. Specifically, it learns the adversarial example via
We also evaluate network robustness using an attack-agnostic approach:
CLEVER CLEVER  (Cross-Lipschitz Extreme Value for nEtwork Robustness) uses extreme value theory to estimate a lower bound of the minimum adversarial distortion. Given an image , CLEVER provides an estimated lower bound on the norm of the minimum distortion required to misclassify the distorted image . A higher CLEVER score suggests that the network is likely to be more robust to adversarial examples. CLEVER is attack-agnostic and reflects the intrinsic robustness of a network, rather than the robustness under a certain attack.
In this work, we use the ImageNet  as the benchmark dataset, due to the following reasons: (i) ImageNet dataset can take full advantage of the studied DNN models since all of them were designed for ImageNet challenges; (ii) comparing to the widely-used small-scale datasets such as MNIST, CIFAR-10 , and GTSRB , ImageNet has significantly more images and classes and is more challenging; and (iii) it has been shown by [39, 48] that ImageNet images are easier to attack but harder to defend than the images from MNIST and CIFAR datasets. Given all these observations, ImageNet is an ideal candidate to study the robustness of state-of-the-art deep image classification models.
A set of randomly selected 1,000 images from the ImageNet validation set is used to generate adversarial examples from each model. For each image, we conduct targeted attacks with a random target and a least likely target as well as an untargeted attack. Misclassified images are excluded. We follow the setting in  to compute CLEVER scores for 100 out of the all 1,000 images, as CLEVER is relatively more computational expensive. Additionally, we conducted another experiment by taking the subset of images (327 images in total) that are correctly classified by all of 18 examined ImageNet models. The results are consistent with our main results and are given in supplementary material.
In our study, the robustness of the DNN models is evaluated using the following four metrics:
Attack success rate For non-targeted attack, success rate indicates the percentage of the adversarial examples whose predicted labels are different from their ground truth labels. For targeted attack, success rate indicates the percentage of the adversarial examples that are classified as the target class. For both attacks, a higher success rate suggests that the model is easier to attack and hence less robust. When generating adversarial examples, we only consider original images that are correctly classified to avoid trial attacks.
Distortion We measure the distortion between adversarial images and the original ones using and norms. norm measures the Euclidean distance between two images, and norm is a measure of the maximum absolute change to any pixel (worst case). Both of them are widely used to measure adversarial perturbations [40, 39, 41]. A higher distortion usually suggests a more robust model. To find adversarial examples with minimum distortion for each model, we use a binary search strategy to select the optimal attack parameters in I-FGSM and in C&W attack. Because each model may have different input sizes, we divide distortions by the number of total pixels for a fair comparison.
CLEVER score For each image, we compute its CLEVER score for target attacks with a random target class and a least-likely class, respectively. The reported number is the averaged score of all the tested images. The higher the CLEVER score, the more robust the model is.
Transferability We follow  to define targeted and non-targeted transferability. For non-targeted attack, transferability is defined as the percentage of the adversarial examples generated for one model (source model) that are also misclassified by another model (target model). We refer to this percentage as error rate, and a higher error rate means better non-targeted transferability. For targeted attack, transferability is defined as matching rate, i.e., the percentage of the adversarial examples generated for source model that are misclassified as the target label (or within top-k labels) by the target model. A higher matching rate indicates better targeted transferability.
After examining all the 18 DNN models, we have learned insights about the relationships between model architectures and robustness, as discussed below.
We have carefully conducted a controlled experiment by pulling images from a common set of 1000 test images when evaluating the robustness of different models. For assessing the robustness of each model, the originally misclassified images are excluded. We compare the success rates of targeted attack with a random target of FGSM, I-FGSM, C&W and EAD-L1 with different parameters for all 18 models. The success rate of FGSM targeted attack is low so we also show its untargeted attack success rate in Figure 1(b).
For targeted attack, the success rate of FGSM is very low (below 3% for all settings), and unlike in the untargeted setting, increasing in fact decreases attack success rate. This observation further confirms that FGSM is a weak attack, and targeted attack is more difficult and needs iterative attacking methods. Figure 1(c) shows that, with only 10 iterations, I-FGSM can achieve a very good targeted attack success rate on all models. C&W and EAD-L1 can also achieve almost 100% success rate on almost all of the models when .
For C&W and EAD-L1 attacks, increasing the confidence can significantly make the attack harder to find a feasible adversarial example. A larger usually makes the adversarial distortion more universal and improves transferability (as we will show shortly), but at the expense of decreasing the success rate and increasing the distortion. However, we find that the attack success rate with large cannot be used as a robustness measure, as it is not aligned with the norm of adversarial distortions. For example, for MobileNet-0.50-160, when , the success rate is close to 0, but in Figure 2 we show that it is one of the most vulnerable networks. The reason is that the range of the logits output can be different for each network, so the difficulty of finding a fixed logit gap is different on each network, and is not related to its intrinsic robustness.
We defer the results for targeted attack with the least likely target label to the Supplementary section because the conclusions made are similar.
|(a) Success rate, targeted FGSM||(b) Success rate, untargeted FGSM|
|(c) Success rate, targeted I-FGSM||(d) Worst case distortion, I-FGSM,|
|(e) Success rate, targeted C&W||(f) Per pixel distortion, targeted C&W|
|(g) Success rate, targeted EAD-L1||(h) Per pixel distortion, targeted EAD-L1|
|(a) Fitted Pareto frontier of distortion (I-FGSM attack) vs. top-1 accuracy:|
|(b) Fitted Pareto frontier of distortion (C&W attack) vs. top-1 accuracy:|
|(c) Fitted Pareto frontier of CLEVER score vs. top-1 accuracy:|
Here we study the empirical relation between robustness and accuracy of different ImageNet models, where the robustness is evaluated in terms of the and distortion metrics from successful I-FGSM and C&W attacks respectively, or CLEVER scores. In our experiments the attack success rates of these attacks are nearly 100% for each model. The scatter plots of distortions/scores v.s. top-1 prediction accuracy are displayed in Figure 2. We define the classification error as 1 minus top-1 accuracy (denoted as ). By regressing the distortion metric with respect to the classification error of networks on the Pareto frontier of robustness-accuracy distribution (i.e., AlexNet, VGG 16, VGG 19, ResNet_v2_152, Inception_ResNet_v2 and NASNet), we find that the distortion scales linearly with the logarithm of classification error. That is, the distortion and classification error has the following relation: . The fitted parameters of and are given in the captions of Figure 2. Take I-FGSM attack as an example, the linear scaling law suggests that to reduce the classification error by a half, the distortion of the resulting network will be expected to reduce by approximately , which is roughly of the AlexNet distortion. Following this trend, if we naively pursue a model with low test error, the model robustness may suffer. Thus, when designing new networks for ImageNet, we suggest to evaluate the model’s accuracy-robustness tradeoff by comparing it to the disclosed Pareto frontier.
We find that model architecture is a more important factor to model robustness than the model size. Each family of networks exhibits a similar level of robustness, despite different depths and model sizes. For example, AlexNet has about 60 million parameters but its robustness is the best; on the other hand, Mobilenet-0.50-160 has only 1.5 million parameters but is more vulnerable to adversarial attacks in all metrics.
We also observe that, within the same family, for DenseNet, ResNet and Inception, models with deeper architecture yields a slight improvement of the robustness in terms of the distortion metric. This might provide new insights for designing robust networks and further improve the Pareto frontier. This result also echoes with , where the authors use a larger model to increase the robustness of a CNN based MNIST model.
Figures 3, 4 and 5 show the transferability heatmaps of FGSM, I-FGSM and EAD-L1 over all 18 models (306 pairs in total). The value in the -th row and -th column of each heatmap matrix is the proportion of the adversarial examples successfully transferred to target model out of all adversarial examples generated by source model (including both successful and failed attacks on the source model). Specifically, the values on the diagonal of the heatmap are the attack success rate of the corresponding model. For each model, we generate adversarial images using the aforementioned attacks and pass them to the target model to perform black-box untargeted and targeted transfer attacks. To evaluate each model, we use the success rate for evaluating the untargeted transfer attacks and the top-5 matching rate for evaluating targeted transfer attacks.
Note that not all models have the same input image dimension. We also find that simply resizing the adversarial examples can significantly decrease the transfer attack success rate . To alleviate the disruptive effect of image resizing on adversarial perturbations, when transferring an adversarial image from a network with larger input dimension to a smaller dimension, we crop the image from the center; conversely, we add a white boarder to the image when the source network’s input dimension is smaller.
Generally, the transferability of untargeted attacks is significantly higher than that of targeted attacks, as indicated in Figure 3, 4 and 5. We highlighted some interesting findings in our experimental results:
In the untargeted transfer attack setting, FGSM and I-FGSM have much higher transfer success rates than those in EAD-L1 (despiting using a large ). Similar to the results in , we find that the transferability of C&W is even worse than that of EAD-L1 and we defer the results to the supplement. The ranking of attacks on transferability in untargeted setting is given by
Again in the untargeted transfer attack setting, for FGSM, a larger yields better transferability, while for I-FGSM, less iterations yield better transferability. For untargeted EAD-L1 transfer attacks, a higher value (confidence parameter) leads to better transferability, but it is still far behind I-FGSM.
Transferability of adversarial examples is sometimes asymmetric; for example, in Figure 4, adversarial examples of VGG 16 are highly transferable to Inception-v2, but adversarial examples of Inception-v2 do not transfer very well to VGG.
We find that VGG 16 and VGG 19 models achieve significantly better transferability than other models, in both targeted and untargeted setting, for all attacking methods, leading to the “stripe patterns”. This means that adversarial examples generated from VGG models are empirically more transferable to other models. This observation might be explained by the simple convolutional nature of VGG networks, which is the stem of all other networks. VGG models are thus a good starting point for mounting black-box transfer attacks. We also observe that the most transferable model family may vary with different attacks.
Most recent networks have some unique features that might restrict adversarial examples’ transferability to only within the same family. For example, as shown in Figure 4, when using I-FGSM in the untargeted transfer attack setting, for DenseNets, ResNets and VGG, transferability between different depths of the same architecture is close to 100%, but their transfer rates to other architectures can be much worse. This provides us an opportunity to reserve-engineer the internal architecture of a black-box model, by feeding it with adversarial examples crafted for a certain architecture and measure the attack success rates.
In this paper, we present the largest scale to date study on adversarial examples in ImageNet models. We show comprehensive experimental results on 18 state-of-the-art ImageNet models using adversarial attack methods focusing on , and norms and also an attack-agnostic robustness score, CLEVER. Our results show that there is a clear trade-off between accuracy and robustness, and a better performance in testing accuracy in general reduces robustness. Tested on the ImageNet dataset, we discover an empirical linear scaling law between distortion metrics and the logarithm of classification errors in representative models. We conjecture that following this trend, naively pursuing high-accuracy models may come with the great risks of lacking robustness. We also provide a thorough adversarial attack transferability analysis between 306 pairs of these networks and discuss the robustness implications on network architecture.
In this work, we focus on image classification. To the best of our knowledge, the scale and profound analysis on 18 ImageNet models have not been studied thoroughly in the previous literature. We believe our findings could also provide insights to robustness and adversarial examples in other computer vision tasks such as object detection  and image captioning 
, since these tasks often use the same pre-trained image classifiers studied in this paper for feature extraction.
To further validate our robustness analysis, we conducted another experiment by taking the subset of images (327 images in total) that are correctly classified by all of 18 examined ImageNet models and show their accuracy-vs-robustness figures on C&W and I-FGSM targeted attacks in Figure 6. The trends and conclusions are consistent with our reported main results.
|(a) Fitted Pareto frontier of distortion (I-FGSM attack) vs. top-1 accuracy:|
|(b) Fitted Pareto frontier of distortion (C&W attack) vs. top-1 accuracy:|
In this section, we summarize the results of using the least-likely label (the class with the smallest probability of the original image) as the target class. Figure 7 (a) and (b) show the distortions of adversarial examples found by I-FGSM and C&W attacks, respectively. Although the least-likely label attack is even more challenging, both I-FGSM and C&W algorithms can still achieve a close to 100% success rate. Similar to Figure 2 of the main text, Figure 7 clearly shows an accuracy v.s. robustness trade-off for models on the Pareto frontier, e.g., AlexNet is the most robust network while the model with the highest accuracy (NASNet) is most prone to adversarial attacks. Likewise, we fit the Pareto frontier and still observe a similar log-linear scaling law.
In this section, we show the transferability of C&W attack in Figure 8, 9 and 10. Comparing with I-FGSM and EAD-L1 attacks, C&W attack using norm yields a much worse transferability success rate. Increasing the confidence parameter can slightly increase its transferability, but is still worse than that of I-FGSM and EAD. On the other hand, increasing reduces C&W attack’s success rates, as we have shown in Figure 1 of the main text. I-FGSM has much better transferability than EAD-L1 and C&W attacks. From Figure 9, 10 and Figure 4 in Section 3.4, we can see that the transferability increases as grows.
In this section, we show more experimental results on I-FGSM attack with different values. Figures 9 and 10 demonstrate the transferability heatmaps of I-FGSM with and Comparing these two heatmaps with Figure 4 in the main text (transferability of I-FGSM with ), we observe that: (i) I-FGSM’s transferability improves when increases; (ii) less iterations usually yield better transferability; (iii) transferability of untargeted attacks is significantly higher than that of targeted attacks; (iv) adversarial examples of VGG networks consistently transfer very well; and (v) adversarial examples are easier to be transfered between the models sharing a same architecture (e.g., ResNets and DenseNets) but different depths.
In  the authors also made a different conclusion on accuracy v.s. robustness. However, we believe our conclusion is not orthogonal to , due to the apparent differences in the definition of “robustness”. In 
, the authors mainly explored the “robustness” (sensitivity) of class label semantics, where in the user study only 20 classes are selected and the I-FGSM attack with a fixed adversary strength is used. Each user is then asked to determine the adversarial label is “relevant” to the original label or not, which is essentially a binarized class label relevance user study. The main message in is that the inherent correlations between image classes, if can be made more distinguishable (i.e., sensitivity as a strength), could be exploited towards building more accurate models. On the other hand, in our paper we used the standard ball perturbation in the pixel space as well as the attack success rates as the robustness measure on ImageNet with 1000 classes. In fact, the “sensitivity” issue has also been studied in  in terms of the “label leaking” effect. To ensure this effect has minimal impact when generating adversarial examples to evaluating the robustness of DNNs, the authors suggest including the attack results with “least likely” targets, which were included in this paper when drawing our conclusions.
Images in ImageNet are organized according to the WordNet  hierarchy. To justify that least likely labels used in our experiments are indeed irrelevant to the original labels, we show their corresponding synsets’ shortest path distances in the WordNet hierarchy in Figure 11. We use Inception-v1 as the model in the experiment. Two labels of shortest path distances greater than 5 are considered irrelevant. In our case, this applies to 96.6% of our least likely attacks and hence the vulnerability is not from the label sensitivity effect as studied in .
In summary, ’s conclusion is that if one can increase the discriminative power against (semantically) similar classes, then the sensitivity in class labels could be a strength for model accuracy. Our conclusion is that more accurate network models appear to be less robust in terms of the required adversarial attack strength defined in ball. Concurrent to this paper and similar to our conclusion,  provides a concrete simple setting to demonstrate the trade-off between accuracy and robustness indeed provably exists, which also provides a technical explanation to our results. We also note that our findings are consistent with the very recent paper  that proves the difficulty of learning robust models against adversarial examples.
In light of , our findings on accuracy-robustness trade-off could be explained by the increasing sensitivity in more accurate models – these two robustness conclusions actually complement each other, rather than being exclusive or contradictory. Specifically, increasing sensitivity aids in improved accuracy but might also make the model more vulnerable. For example, increasing the sensitivity in classifying different dog species can improve the model accuracy, but may at the same time contribute to smaller adversarial perturbations.
Proceedings of the Thirtieth IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, International Joint Conferences on Artificial Intelligence Organization (7 2018) 3905–3911
Proceedings of the 35th International Conference on Machine Learning (ICML) (2018)
Inception-v4, inception-resnet and the impact of residual connections on learning.In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. (2017) 4278–4284
Robustness may be at odds with accuracy.In: International Conference on Learning Representations (ICLR). (2019)