Smooth Adversarial Training
It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key observation is that the widely-used ReLU activation function significantly weakens adversarial training due to its non-smooth nature. Hence we propose smooth adversarial training (SAT), in which we replace ReLU with its smooth approximations to strengthen adversarial training. The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training. Compared to standard adversarial training, SAT improves adversarial robustness for "free", i.e., no drop in accuracy and no increase in computational cost. For example, without introducing additional computations, SAT significantly enhances ResNet-50's robustness from 33.0 0.9 EfficientNet-L1 to achieve 82.2 outperforming the previous state-of-the-art defense by 9.5 11.6READ FULL TEXT VIEW PDF
Deep neural networks are known to be vulnerable to adversarially perturb...
We propose a simple change to the current neural network structure for
Adversarial training is a popular method to give neural nets robustness
Adversarial training is one of the most effective defenses against
We propose a method for improving adversarial robustness by addition of ...
We propose a general framework for increasing local stability of Artific...
Adversarial Training (AT) is proposed to alleviate the adversarial
Smooth Adversarial Training
Convolutional neural networks can be easily attacked by adversarial examples, which are computed by adding small perturbations to clean inputs Szegedy et al. (2014). Many efforts have been devoted to improving network resilience against adversarial attacks Guo et al. (2018); Papernot et al. (2016); Buckman et al. (2018); Xie et al. (2018); Madry et al. (2018); Prakash et al. (2018); Pang et al. (2019). Among them, adversarial training Goodfellow et al. (2015); Kurakin et al. (2017); Madry et al. (2018), which trains networks with adversarial examples on-the-fly, stands as one of the most effective methods. Later works further improve adversarial training by feeding the networks with harder adversarial examples Wang et al. (2019), maximizing the margin of networks Ding et al. (2020), optimizing a regularized surrogate loss Zhang et al. (2019), etc. While these methods achieve stronger adversarial robustness, they sacrifice the accuracy on clean inputs. It is generally believed that this trade-off between accuracy and robustness might be inevitable Tsipras et al. (2019), unless additional computational budgets are introduced to enlarge network capacities Madry et al. (2018); Xie and Yuille (2020).
Another popular direction for increasing robustness against adversarial attacks is gradient masking Papernot et al. (2017); Athalye et al. (2018), which usually introduces non-differentiable operations (e.g., discretization Buckman et al. (2018); Rozsa and Boult (2019)) to obfuscate gradients. With degenerated gradients, attackers cannot successfully optimize the targeted loss and fail to break such defenses. Nonetheless, gradient masking will be ineffective if its differentiable approximation is used for generating adversarial examples Athalye et al. (2018).
The bitter history of gradient masking defenses motivates us to rethink the relationship between gradient quality and adversarial robustness, especially in the context of adversarial training where gradients are applied more frequently than standard training. In addition to computing gradients to update network parameters, adversarial training also requires gradient computation for generating training samples. Guided by this principle, we identify that ReLU, a widely-used activation function in most network architectures, significantly weakens adversarial training due to its non-smooth nature, e.g., ReLU’s gradient gets an abrupt change when its input is zero, as illustrated in Figure 1.
To fix the issue induced by ReLU, in this paper, we propose smooth adversarial training (SAT), which enforces architectural smoothness via replacing ReLU with its smooth approximations111More precisely, when we say a function is smooth in this paper, we mean this function is 1 smooth, i.e., its first derivative is continuous everywhere. for improving the gradient quality in adversarial training (Figure 1 shows Parametric SoftPlus, an example of smooth approximations for ReLU). With smooth activation functions, SAT is able to feed the networks with harder adversarial training samples and compute better gradient updates for network optimization, hence substantially strengthens adversarial training. Our experiment results show that SAT improves adversarial robustness for “free”, i.e., without incurring additional computations or degrading standard accuracy. For instance, by training with the economical single-step PGD attacker222Models trained with single-step PGD attackers only cost training time than standard training Madry et al. (2018) on ImageNet Russakovsky et al. (2015), SAT significantly improves ResNet-50’s robustness by 9.3%, from 33.0% to 42.3%, while increasing the standard accuracy by 0.9% without incurring additional computational cost.
We also explore the limits of SAT with larger networks. We obtain the best result by using EfficientNet-L1 Tan and Le (2019); Xie et al. (2020), which achieves 82.2% accuracy and 58.6% robustness on ImageNet, significantly outperforming the prior art Qin et al. (2019) by 9.5% for accuracy and 11.6% for robustness.
Adversarial training improves robustness by training models on adversarial examples Goodfellow et al. (2015); Kurakin et al. (2017); Madry et al. (2018). Existing works suggest that, to further improve adversarial robustness, we need to either sacrifice accuracy on clean inputs Wang et al. (2019, 2020); Zhang et al. (2019); Ding et al. (2020), or incur additional computational cost Madry et al. (2018); Xie and Yuille (2020). This phenomenon is generally referred to as the no free lunch in adversarial robustness Tsipras et al. (2019); Nakkiran (2019); Su et al. (2018). In this paper, we show that, with SAT, adversarial robustness can be improved for “free”.
Besides training models on adversarial data, alternatives for improving adversarial robustness include defensive distillationPapernot et al. (2016), randomized transformations Xie et al. (2018); Dhillon et al. (2018); Liu et al. (2018); Wang et al. (2018); Bhagoji et al. (2018), adversarial input purification Guo et al. (2018); Prakash et al. (2018); Meng and Chen (2017); Song et al. (2018); Samangouei et al. (2018); Liao et al. (2018); Bhagoji et al. (2018), etc. Nonetheless, these defense methods degenerate the gradient quality, therefore induce the gradient masking issue Papernot et al. (2017), which gives a false sense of adversarial robustness Athalye et al. (2018). In contrast to these works, we aim to improve adversarial robustness by providing networks with better gradients, but in the context of adversarial training.
We hereby perform a series of control experiments in the backward pass of gradient computations to investigate how ReLU weakens, and how its smooth approximation strengthens adversarial training.
where is the underlying data distribution,
is the loss function,is the network parameter, is a training sample with the ground-truth label , is the added adversarial perturbation, and is the allowed perturbation range. As shown in Equation (1), adversarial training consists of two computation steps: an inner maximization step, which computes adversarial examples, and an outer minimization step, which computes parameter updates.
We choose ResNet-50 He et al. (2016) as the backbone network. We apply the PGD attacker Madry et al. (2018) to generate adversarial perturbations . Specifically, we select the cheapest version of PGD, single-step PGD (PGD-1), to lower the training cost. Following Shafahi et al. (2019); Wong et al. (2020), we set the maximum per-pixel change and the attack step size
. We follow the basic ResNet training recipes to train models on ImageNet: models are trained for a total of 100 epochs using momentum SGD optimizer, with the learning rate decreased byat the 30-th, 60-th and 90-th epoch; no regularization except a weight decay of 1e-4 is applied.
When evaluating adversarial robustness, we measure the model’s top-1 accuracy against the 200-step PGD attacker (PGD-200) on the ImageNet validation set, with the maximum perturbation size and the step size . We note 200 attack iteration is enough to let PGD attacker converge. Meanwhile, we report the model’s top-1 accuracy on the original ImageNet validation set.
As shown in Figure 1, the widely used activation function, ReLU, is non-smooth, i.e., its gradient takes an abrupt change, when its input is 0, which significantly degrades the gradient quality. We conjecture that this non-smooth nature hurts the training process, especially when we train models adversarially. This is because, compared to standard training which only computes gradients for updating network parameter , adversarial training requires additional computations for the inner maximization step to craft the perturbation .
To fix this problem, we first introduce a smooth approximation of ReLU, named Parametric Softplus Nair and Hinton (2010), as follows,
where the hyperparameteris used to control the curve shape. The derivative of this function w.r.t. the input is:
To better approximate the curve of ReLU, we empirically set . As shown in Figure 1, compared to ReLU, Parametric Softplus (=10) is smooth because it has a continuous derivative.
With Parametric Softplus, we next diagnose how gradient quality in the inner maximization step and the outer minimization step affects the accuracy and robustness of ResNet-50 in adversarial training. To clearly benchmark the effects, we only substitute ReLU with Equation (2) in the backward pass, while leaving the forward pass unchanged, i.e., ReLU is always used for model inference.
We first take a look at the effects of gradient quality on computing adversarial examples (i.e., the inner maximization step) during training. More precisely, in the inner step of adversarial training, we use ReLU in the forward pass, but Parametric Softplus in the backward pass; and in the outer step, we use ReLU in both the forward and the backward pass. As shown in the second row of Table 1, when the attacker uses Parametric Softplus’s gradient to craft training samples, the resulted model exhibits a performance trade-off compared to the ReLU baseline, e.g., it improves adversarial robustness by 1.5% but degrades accuracy by 0.5%. We hypothesize that the enhanced adversarial robustness can be attributed to harder adversarial examples generated during training, i.e., better gradients for the inner maximization step boost the attacker’s strength. To further verify this hypothesis, we evaluate the robustness of two ResNet-50 models via PGD-1 (vs. PGd-200 in Table 1), one with standard training and one with adversarial training. Specifically, during the evaluation, the attacker uses ReLU in the forward pass, but Parametric Softplus in the backward pass. With better gradients, the PGD-1 attacker is strengthened and hurts models more: it can further decrease the top-1 accuracy by 4.0% (from 16.9% to 12.9%) on the standard training and by 0.7% (from 48.7% to 48.0%) on the adversarial training (both not shown in Table 1). Finally, as shown in Table 1 (second row), we note that this robustness improvement is at the expense of accuracy, which is consistent with previous works Wang et al. (2019).
We then study the role of gradient quality on updating network parameters (i.e., the outer minimization step) during training. More precisely, in the inner step of adversarial training, we use ReLU; but in the outer step, we use ReLU in the forward pass, and Parametric Softplus in the backward pass. Surprisingly, this method improves adversarial robustness for “free”. As shown in the third row of Table 1, without incurring additional computations, adversarial robustness is boosted by 2.8%, and meanwhile accuracy is improved by 0.6%, compared to the ReLU baseline. We note the corresponding training loss also gets lower: the cross-entropy loss on the training set is reduced from 2.71 to 2.59. These results of better robustness, accuracy, and lower training loss together suggest that, with Equation (3), networks are able to compute better gradient updates in adversarial training. Interestingly, we also observe that better gradient updates improve the standard training, i.e., with ResNet-50, training with better gradients is able to improve accuracy from 76.8% to 77.0%, and reduces the corresponding training loss from 1.22 to 1.18. These results on both adversarial and standard training suggest that updating network parameters using better gradients could serve as a principle for improving performance in general, while keeping the inference process of the model unchanged (i.e., ReLU is still used during inference).
Given the observation that improving ReLU’s gradient for either the adversarial attacker or the network optimizer benefits robustness, we further enhance adversarial training by replacing ReLU with Parametric Softplus in all backward passes, but keeping ReLU in all forward passes. As expected, such a trained model reports the best robustness so far, i.e., as shown in the last row of Table 1, it substantially outperforms the ReLU baseline by 3.9% for robustness. Interestingly, this improvement still comes for “free”, i.e., it reports 0.1% higher accuracy than the ReLU baseline. We conjecture this is mainly due to the positive effect on accuracy brought by computing better gradient updates (increase accuracy) slightly overriding the negative effects on accuracy brought by creating harder training samples (hurt accuracy) in this experiment.
|Improving Gradient Quality for||Improving Gradient Quality for||Accuracy (%)||Robustness (%)|
|the Adversarial Attacker||the Network Parameter Updates|
|✓||✗||68.3 (-0.5)||34.5 (+1.5)|
|✗||✓||69.4 (+0.6)||35.8 (+2.8)|
|✓||✓||68.9 (+0.1)||36.9 (+3.9)|
It is known that increasing the number of attack iterations can create harder adversarial examples Madry et al. (2018). We confirm in our own experiments that by training with the PGD attacker with more iterations, the resulted model exhibits a similar behavior to the case where we apply better gradients for the attacker. By increasing the attacker’s cost by , PGD-2 improves the ReLU baseline by 0.6% for robustness while losing 0.1% for accuracy. This result suggests we can remedy ReLU’s gradient issue in the inner step of adversarial training if more computations are given.
It is also known that longer training lowers the training loss Hoffer et al. (2017), which we explore next. Interestingly, with a training cost compared to the standard setup (e.g., 200 epochs), though the final model indeed achieves a lower training loss (from 2.71 to 2.62), there is still a trade-off between accuracy and robustness. Longer training gains 2.6% for accuracy but loses 1.8% for robustness. On the contrary, applying better gradients for optimizing networks in the previous section improves both robustness and accuracy. This discouraging result suggests that training longer cannot fix the issues in the outer step of adversarial training caused by ReLU’s poor gradient.
Given these results, we conclude that ReLU significantly weakens adversarial training. Moreover, it seems that the degenerated performance cannot be simply remedied even with training enhancements (i.e., increasing the number of attack iterations & training longer). We identify that the key is ReLU’s poor gradient—by replacing ReLU with its smooth approximation only in the backward pass substantially improves robustness, even without sacrificing accuracy and incurring additional computational cost. In the next section, we show that making activation functions smooth is a good design principle for enhancing adversarial training in general.
As shown above, improving ReLU’s gradient can both strengthen the attacker and provide better gradient updates. Nonetheless, this strategy may be suboptimal as there still is a discrepancy between the forward pass (which we use ReLU) and the backward pass (which we use Parametric Softplus).
To fully exploit the potential of training with better gradients, we hereby propose smooth adversarial training (SAT), which enforces architectural smoothness via the exclusive usage of smooth activation functions in adversarial training. We keep all other network components the same, as most of them will not result in the issue of poor gradient.333 We ignore the gradient issue caused by max pooling, which is also non-smooth, in SAT. This is because modern architectures rarely adopt it,
We ignore the gradient issue caused by max pooling, which is also non-smooth, in SAT. This is because modern architectures rarely adopt it,e.g. only 1 max pooling layer in ResNet He et al. (2016), and 0 in EfficientNet Tan and Le (2019).
We consider the following activation functions as the smooth approximations of ReLU in SAT (Figure 2 plots these functions as well as their derivatives):
We follow the settings in Section 3 to adversarially train ResNet-50 equipped with these smooth activation functions. The results are shown in Figure 3. Compared to the ReLU baseline, all smooth activation functions substantially boost robustness, while keeping the standard accuracy almost the same. For example, smooth activation functions at least boost robustness by 5.7% (using Parametric Softplus, from 33% to 38.7%). The strongest robustness is achieved by Swish Ramachandran et al. (2017), which enables ResNet-50 to achieve 42.3% robustness and 69.7% standard accuracy.
Additionally, we compare to the setting in Section 3 where Parametric Softplus is only applied at the backward pass. Interestingly, by additionally replacing ReLU with Parametric Softplus at the forward pass, the resulted model further improves robustness by 1.8% (from 36.9% to 38.7%) while keeping the accuracy almost the same, demonstrating the importance of applying smooth activation functions in both forward and backward passes in SAT.
Compared to ReLU, in addition to being smooth, the functions above have non-zero responses to negative inputs () which may also affect adversarial training. To rule out this factor, we hereby propose SmoothReLU, which flattens the activation function by only modifying ReLU after ,
where is a learnable variable shared by all channels, and is constrained to be positive. We note SmoothReLU is always continuously differentiable regardless the value of , as
SmoothReLU converges to ReLU when . Note that needs to be initialized at a large enough value (e.g., 400 in our experiments) to avoid the gradient vanishing problem at the beginning of training. We plot SmoothReLU and its first derivative in Figure 2.
We observe SmoothReLU substantially outperforms ReLU by 7.3% for robustness (from 33.0% to 40.3%), and by 0.6% for accuracy (from 68.8% to 69.4%), therefore clearly demonstrates the importance of a function to be smooth, and rules out the effect from having responses when .
In the analysis above, we show that adversarial training can be greatly improved by replacing ReLU with its smooth approximations. To further demonstrate the generalization of SAT (beyond ReLU), we discuss another type of activation function—ELU. The first derivative of ELU is shown below:
Here we mainly discuss the scenario when ELU is non-smooth, i.e., . As can be seen from Equation (7), ELU’s gradient is not continuously differentiable anymore, i.e., when , therefore resulting in an abrupt gradient change like ReLU. Specifically, we consider the scenario , where the gradient abruption becomes more drastic with a larger value of .
We show the adversarial training results in Table 2. Interestingly, we observe that the adversarial robustness is highly dependent on the value of —the strongest robustness is achieved when the function is smooth (i.e., , 41.4% robustness), and all other choices of monotonically decrease the robustness when gradually approaches 2.0. For instance, with , the robustness drops to only 33.2%, which is lower than that of using . The observed phenomenon here is consistent with our previous conclusion on ReLU—non-smooth activation functions significantly weaken adversarial training.
To stabilize the adversarial training with ELU, we apply its smooth version, CELU Barron (2017), which re-parametrize ELU to the following format:
The first derivatives of CELU can be written as follows:
With this parameterization, CELU is now continuously differentiable regardless of the choice of .
We observe that CELU greatly stabilizes adversarial training, i.e., compared to , the worst case in CELU is merely 0.5% lower (shown in Table 2). Recall that this gap for ELU is 7.9%. This case study provides another strong support on showing the importance of performing SAT.
Recent works Xie and Yuille (2020); Gao et al. (2019) show that, compared to standard training, adversarial training exhibits a much stronger requirement for larger networks to obtain better performance. Nonetheless, previous explorations in this direction only consider either deeper networks Xie and Yuille (2020) or wider networks Madry et al. (2018), which might be insufficient. To this end, we hereby present a systematic study on showing how network scaling up behaves in SAT. Specifically, we set Swish as the default activation function to perform SAT, as it achieves the best robustness among different candidates (as shown in Figure 3).
We first perform the network scaling-up experiments with ResNet in SAT. In standard training, Tan et al. Tan and Le (2019) suggest that, all three scaling-up factors, i.e., depth, width and image resolutions, are important to further improve ResNet performance. We hereby examine the effects of these factors in SAT. We choose ResNet-50 (with the default image resolution at 224) as the baseline network.
Previous works Madry et al. (2018); Xie and Yuille (2020) already show that making networks deeper or wider can further standard adversarial training. We re-verify this conclusion in SAT. As shown in the second to fifth rows of Table 3, we confirm that both deeper or wider networks consistently outperform the baseline network in SAT. For instance, by training a deeper ResNet-152, it improves ResNet-50’s performance by 4.2% for accuracy and 3.7% for robustness. Similarly, by training a wider ResNeXt-50-32x8d Xie et al. (2017), it improves accuracy by 3.9% and robustness by 2.8%.
Though larger image resolution benefits standard training, it is generally believed that scaling up this factor will induce weaker adversarial robustness, as the attacker will have a larger room for crafting adversarial perturbations Galloway et al. (2019). However, surprisingly, this belief is invalid when taking adversarial training into consideration. As shown in the sixth and seventh rows of Table 3, ResNet-50 consistently achieves better performance when training with larger image resolutions in SAT. We conjecture this improvement is possible because a larger image resolution (1) enables attackers to create stronger adversarial examples Galloway et al. (2019); and (2) increases network capacity Tan and Le (2019), therefore benefits SAT overall.
So far, we have confirmed that the basic scaling of depth, width and image resolution are all important scaling-up factors in SAT. As argued in Tan and Le (2019) for standard training, scaling up all these factors simultaneously is better than just focusing on a single dimension (e.g., depth). To this end, we make an attempt to create a simple compound scaling for ResNet. As shown in the last row of Table 3, the resulted model, ResNeXt-152-32x8d with input resolution at 380, achieves a much stronger result than the ResNet-50 baseline, i.e., +8.5% for accuracy and +8.9% for robustness.
|Accuracy (%)||Robustness (%)|
|+ 2x deeper (ResNet-101)||72.9 (+3.2)||45.5 (+3.2)|
|+ 3x deeper (ResNet-152)||73.9 (+4.2)||46.0 (+3.7)|
|+ 2x wider (ResNeXt-50-32x4d)||71.2 (+1.5)||42.5 (+0.2)|
|+ 4x wider (ResNeXt-50-32x8d)||73.6 (+3.9)||45.1 (+2.8)|
|+ larger resolution 299||70.9 (+1.2)||43.8 (+1.5)|
|+ larger resolution 380||71.6 (+1.9)||44.1 (+1.8)|
|+ 3x deeper & 4x wider (ResNeXt-152-32x8d) & larger resolution 380||78.2 (+8.5)||51.2 (+8.9)|
We first verify that basic scaling of depth, width and image resolution also matter in standard adversarial training, e.g., by scaling up ResNet-50 (33.0% robustness), the deeper ResNet-152 achieves 39.4% robustness (+6.4%), the wider ResNeXt-50-32x8d achieves 36.7% robustness (+3.7%) and the ResNet-50 with larger image resolution at 380 achieves 36.9% robustness (+3.9%). All these robustness performances are lower than the robustness (42.3%) achieved by the ResNet-50 baseline in SAT (first row of Table 3). In other words, scaling up networks seems less effective than replacing ReLU with smooth activation functions.
We also find compound scaling is more effective than basic scaling for standard adversarial training, e.g., ResNeXt-152-32x8d with input resolution at 380 here reports 46.3% robustness. Although this result is better than adversarial training with basic scaling above, it is still 5% lower than SAT with compound scaling, i.e., 46.3% v.s. 51.2%. In other words, even with larger networks, applying smooth activation functions in adversarial training is still essential for improving performance.
The results on ResNet show that scaling up networks in SAT effectively improves performance. Nonetheless, the applied scaling policies could be suboptimal, as they are hand-designed without any optimizations. EfficientNet Tan and Le (2019), which uses neural architecture search to automatically discover the optimal factors for network scaling, provides a strong family of models for image recognition. To examine the benefits of EfficientNet, we now use it to replace ResNet in SAT. Note that all other training settings are the same as described in our ResNet experiments.
Similar to ResNet, Figure 4 shows that stronger backbones consistently achieve better performance in SAT. For instance, by scaling the network from EfficientNet-B0 to EfficientNet-B7, the robustness is improved from 37.6% to 57.0%, and the accuracy is improved from 65.1% to 79.8%. Surprisingly, the improvement is still observable for larger networks: EfficientNet-L1 Xie et al. (2020) further improves robustness by 1.0% and accuracy by 0.7% over EfficientNet-B7.
So far all of our experiments follow the training recipes from ResNet, which may not be optimal for EfficientNet training. To this end, we import the following settings to our experiments as in original EfficientNet training setups Tan and Le (2019): we change weight decay from 1e-4 to 1e-5, and add Dropout Srivastava et al. (2014), stochastic depth Huang et al. (2016) and AutoAugment Cubuk et al. (2019) to regularize the training process. Besides, we train models longer (i.e., 200 epochs) to better cope with these training enhancements, and adopt the early stopping strategy to prevent the catastrophic overfitting issue in robustness Wong et al. (2020). With these training enhancements, our EfficientNet-L1 gets further improved, i.e., +1.7% for accuracy (from 80.5% to 82.2%) and +0.6% for robustness (from 58.0% to 58.6%).
|Accuracy (%)||Robustness (%)|
|Prior art Qin et al. (2019)||72.7||47.0|
|EfficientNet+SAT||82.2 (+9.5)||58.6 (+11.6)|
Table 4 compares our best results with the prior art. With SAT, we are able to train a model with strong performance on both adversarial robustness and standard accuracy—our best model (EfficientNet-L1 + SAT) achieves 82.2% standard accuracy and 58.6% robustness, which largely outperforms the previous state-of-the-art method Qin et al. (2019) by 9.5% on standard accuracy and 11.6% on adversarial robustness.
Finally, we emphasize a large reduction in the accuracy gap between adversarially trained models and standard trained models for large networks. For example, with the training setup above (with enhancements), EfficientNet-L1 achieves 84.1% accuracy in standard training, and this accuracy slightly decreases to 82.2% (-1.9%) in SAT. Note that this gap is substantially smaller than the gap in ResNet-50 of 7.1% (76.8% in standard training v.s. 69.7% in SAT). Moreover, it is also worth mentioning that the high accuracy of 82.2% provides strong support to Ilyas et al. (2019) on arguing robust features indeed can generalize well to clean inputs.
In this paper, we propose smooth adversarial training, which enforces architectural smoothness via replacing non-smooth activation functions with their smooth approximations in adversarial training. SAT improves adversarial robustness without sacrificing standard accuracy or incurring additional computation cost. Extensive experiments demonstrate the general effectiveness of SAT. With EfficientNet-L1, SAT reports the state-of-the-art adversarial robustness on ImageNet, which largely outperforms the prior art Qin et al. (2019) by 9.5% for accuracy and 11.6% for robustness.
Our work points out that architectural smoothness plays an essential role in learning a robust model, which has not been paid attention to in the community. We believe this is a general design principle, and should well generalize to broader tasks like natural language processing, reinforcement learning,etc. Our work also provides interesting observations on suggesting different architecture designs indeed will have significantly different performance on robustness, therefore have great potential to inspire later works on finding better architectures, e.g., through either hand design or neural architecture search, to further increase robustness. We strongly believe that SAT can have high practical impacts since our method can greatly enhance model robustness against adversarial attacks, which is important for enabling models to work reliably in real-world scenarios, especially for safety-critical applications like self-driving cars and surgical robots.
We would like to thank Jiang Wang, Chongli Qin and Sven Gowal for valuable discussions at the early stage of this project.
Enhancing robustness of machine learning systems via data transformations. In CISS, Cited by: §2.
Towards deep learning models resistant to adversarial attacks. In ICLR, Cited by: §1, §1, §2, §3.1, §3.1, §3.3, §5.1, §5.
Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532. Cited by: §2.
Defense-gan: protecting classifiers against adversarial attacks using generative models. In ICLR, Cited by: §2.