Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks

08/14/2022
by   Tian Yu Liu, et al.
0

A powerful category of data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they often drastically harm the generalization performance, or are attack-specific and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks exploit sharp loss regions to craft adversarial perturbations which can substantially alter examples' gradient or representations under small perturbations. To break poisoning attacks, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a random varying noise component. The first component takes examples farther away from the sharp loss regions, and the second component smooths out the loss landscape. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component.

READ FULL TEXT
research
10/18/2022

Not All Poisons are Created Equal: Robust Training against Data Poisoning

Data poisoning causes misclassification of test time target examples by ...
research
04/10/2020

Luring of Adversarial Perturbations

The growing interest for adversarial examples, i.e. maliciously modified...
research
10/15/2021

Adversarial Purification through Representation Disentanglement

Deep learning models are vulnerable to adversarial examples and make inc...
research
07/11/2019

Why Blocking Targeted Adversarial Perturbations Impairs the Ability to Learn

Despite their accuracy, neural network-based classifiers are still prone...
research
12/04/2018

Adversarial Example Decomposition

Research has shown that widely used deep neural networks are vulnerable ...
research
10/15/2019

Adversarial Examples for Models of Code

We introduce a novel approach for attacking trained models of code with ...
research
09/15/2023

HINT: Healthy Influential-Noise based Training to Defend against Data Poisoning Attacks

While numerous defense methods have been proposed to prohibit potential ...

Please sign up or login with your details

Forgot password? Click here to reset