Provable Defense Against Delusive Poisoning

02/09/2021
by   Lue Tao, et al.
5

Delusive poisoning is a special kind of attack to obstruct learning, where the learning performance could be significantly deteriorated by only manipulating (even slightly) the features of correctly labeled training examples. By formalizing this malicious attack as finding the worst-case distribution shift at training time within a specific ∞-Wasserstein ball, we show that minimizing adversarial risk on the poison data is equivalent to optimizing an upper bound of natural risk on the original data. This implies that adversarial training can be a principled defense method against delusive poisoning. To further understand the internal mechanism of the defense, we disclose that adversarial training can resist the training distribution shift by preventing the learner from overly relying on non-robust features in a natural setting. Finally, we complement our theoretical findings with a set of experiments on popular benchmark datasets, which shows that the defense withstands six different practical attacks. Both theoretical and empirical results vote for adversarial training when confronted with delusive poisoning.

READ FULL TEXT

page 8

page 18

page 19

page 20

research
01/31/2022

Can Adversarial Training Be Manipulated By Non-Robust Features?

Adversarial training, originally designed to resist test-time adversaria...
research
06/09/2019

Beyond Adversarial Training: Min-Max Optimization in Adversarial Attack and Defense

The worst-case training principle that minimizes the maximal adversarial...
research
11/03/2018

Learning to Defense by Learning to Attack

Adversarial training provides a principled approach for training robust ...
research
10/29/2017

Certifiable Distributional Robustness with Principled Adversarial Training

Neural networks are vulnerable to adversarial examples and researchers h...
research
09/10/2018

Second-Order Adversarial Attack and Certifiable Robustness

We propose a powerful second-order attack method that outperforms existi...
research
08/07/2020

Visual Attack and Defense on Text

Modifying characters of a piece of text to their visual similar ones oft...
research
06/12/2020

Learning Diverse Representations for Fast Adaptation to Distribution Shift

The i.i.d. assumption is a useful idealization that underpins many succe...

Please sign up or login with your details

Forgot password? Click here to reset