Improving Resistance to Adversarial Deformations by Regularizing Gradients
Improving the resistance of deep neural networks against adversarial attacks is important for deploying models to realistic applications. Currently, most defense methods are designed to defend against additive noise attacks, their performance cannot be guaranteed when against non-additive noise attacks. In this paper, we focus on adversarial deformations, a typical class of non-additive noise attacks, and propose a flow gradient regularization with random start to improve the resistance of models. Theoretically, we prove that, compared with input gradient regularization, regularizing flow gradients is able to get a tighter bound. Across multiple datasets, architectures, and adversarial deformations, our experimental results consistently indicate that models trained with flow gradient regularization can acquire a better resistance than trained with input gradient regularization with a large margin. Moreover, compared with adversarial training, our method can achieve better results in optimization-based and gradient-free attacks, and combining these two methods can improve the resistance against deformation attacks further. Finally, we give a unified form of gradient regularization, which can be used to derive the corresponding form when facing other types of attack.
READ FULL TEXT