Toward Adversarial Robustness via Semi-supervised Robust Training

03/16/2020

∙

Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk R_adv, which encourages both the benign example x and its adversarially perturbed neighborhoods within the ℓ_p-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks (R_stand and R_rob), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that R_adv is upper-bounded by R_stand + R_rob, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since R_rob is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode (i.e., SRT), to further enhance the adversarial robustness. Moreover, we extend the ℓ_p-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise (i.e., x + δ) or the spatial perturbation (i.e., AX + b). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at <https://github.com/THUYimingLi/Semi-supervised_Robust_Training>.

READ FULL TEXT

Toward Adversarial Robustness via Semi-supervised Robust Training

Sign in with Google

Consider DeepAI Pro