Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

08/31/2023
by   Satoshi Suzuki, et al.
0

This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is standardly pretrained on clean examples. RGKD and NR respectively entail a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the standardly pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST mitigates the tradeoff more effectively than previous AT-based methods do.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2022

Latent Boundary-guided Adversarial Training

Deep Neural Networks (DNNs) have recently achieved great success in many...
research
01/23/2020

Towards Robust DNNs: An Taylor Expansion-Based Method for Generating Powerful Adversarial Examples

Although deep neural networks (DNNs) have achieved successful applicatio...
research
03/14/2018

Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples

Deep Neural Networks (DNNs) have achieved remarkable performance in a my...
research
08/20/2021

AdvDrop: Adversarial Attack to DNNs by Dropping Information

Human can easily recognize visual objects with lost information: even lo...
research
06/06/2023

Revisiting the Trade-off between Accuracy and Robustness via Weight Distribution of Filters

Adversarial attacks have been proven to be potential threats to Deep Neu...
research
03/11/2022

Learning from Attacks: Attacking Variational Autoencoder for Improving Image Classification

Adversarial attacks are often considered as threats to the robustness of...
research
01/07/2023

Adversarial training with informed data selection

With the increasing amount of available data and advances in computing c...

Please sign up or login with your details

Forgot password? Click here to reset