Fooling Adversarial Training with Inducing Noise

11/19/2021
by   Zhirui Wang, et al.
0

Adversarial training is widely believed to be a reliable approach to improve model robustness against adversarial attack. However, in this paper, we show that when trained on one type of poisoned data, adversarial training can also be fooled to have catastrophic behavior, e.g., <1% robust test accuracy with >90% robust training accuracy on CIFAR-10 dataset. Previously, there are other types of noise poisoned in the training data that have successfully fooled standard training (15.8% standard test accuracy with 99.9% standard training accuracy on CIFAR-10 dataset), but their poisonings can be easily removed when adopting adversarial training. Therefore, we aim to design a new type of inducing noise, named ADVIN, which is an irremovable poisoning of training data. ADVIN can not only degrade the robustness of adversarial training by a large margin, for example, from 51.7% to 0.57% on CIFAR-10 dataset, but also be effective for fooling standard training (13.1% standard test accuracy with 100% standard training accuracy). Additionally, ADVIN can be applied to preventing personal data (like selfies) from being exploited without authorization under whether standard or adversarial training.

READ FULL TEXT

page 14

page 15

research
06/16/2019

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy

Adversarial robustness has become a central goal in deep learning, both ...
research
10/05/2020

Geometry-aware Instance-reweighted Adversarial Training

In adversarial machine learning, there was a common belief that robustne...
research
02/06/2023

GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks

While leveraging additional training data is well established to improve...
research
05/24/2022

One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks

Unlearnable examples (ULEs) aim to protect data from unauthorized usage ...
research
10/04/2018

Feature prioritization and regularization improve standard accuracy and adversarial robustness

Adversarial training has been successfully applied to build robust model...
research
11/20/2020

Adversarial Training for EM Classification Networks

We present a novel variant of Domain Adversarial Networks with impactful...
research
04/19/2021

Provable Robustness of Adversarial Training for Learning Halfspaces with Noise

We analyze the properties of adversarial training for learning adversari...

Please sign up or login with your details

Forgot password? Click here to reset