Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

07/20/2023
by   Shaokui Wei, et al.
0

Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU). Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs such that they are either correctly classified by the purified model and/or differently classified by the two models, such that the backdoor effect in the backdoored model will be mitigated in the purified model. Experiments on various benchmark datasets and network architectures show that our proposed method achieves state-of-the-art performance for backdoor defense.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2018

Improving the Generalization of Adversarial Training with Domain Adaptation

By injecting adversarial examples into training data, the adversarial tr...
research
12/29/2020

With False Friends Like These, Who Can Have Self-Knowledge?

Adversarial examples arise from excessive sensitivity of a model. Common...
research
02/22/2020

Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples

Adversarial examples have become one of the largest challenges that mach...
research
10/24/2020

Are Adversarial Examples Created Equal? A Learnable Weighted Minimax Risk for Robustness under Non-uniform Attacks

Adversarial Training is proved to be an efficient method to defend again...
research
10/21/2022

Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks

An off-the-shelf model as a commercial service could be stolen by model ...
research
10/23/2020

Overcoming Conflicting Data for Model Updates

In this paper, we explore how to use a small amount of new data to updat...
research
09/24/2021

Local Intrinsic Dimensionality Signals Adversarial Perturbations

The vulnerability of machine learning models to adversarial perturbation...

Please sign up or login with your details

Forgot password? Click here to reset