ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms

02/22/2023
by   Minzhou Pan, et al.
6

Backdoor data detection is traditionally studied in an end-to-end supervised learning (SL) setting. However, recent years have seen the proliferating adoption of self-supervised learning (SSL) and transfer learning (TL), due to their lesser need for labeled data. Successful backdoor attacks have also been demonstrated in these new settings. However, we lack a thorough understanding of the applicability of existing detection methods across a variety of learning settings. By evaluating 56 attack settings, we show that the performance of most existing detection methods varies significantly across different attacks and poison ratios, and all fail on the state-of-the-art clean-label attack. In addition, they either become inapplicable or suffer large performance losses when applied to SSL and TL. We propose a new detection method called Active Separation via Offset (ASSET), which actively induces different model behaviors between the backdoor and clean samples to promote their separation. We also provide procedures to adaptively select the number of suspicious points to remove. In the end-to-end SL setting, ASSET is superior to existing methods in terms of consistency of defensive performance across different attacks and robustness to changes in poison ratios; in particular, it is the only method that can detect the state-of-the-art clean-label attack. Moreover, ASSET's average detection rates are higher than the best existing methods in SSL and TL, respectively, by 69.3 backdoor defense for these new DL settings. We open-source the project to drive further development and encourage engagement: https://github.com/ruoxi-jia-group/ASSET.

READ FULL TEXT

page 16

page 17

page 18

research
05/21/2021

Backdoor Attacks on Self-Supervised Learning

Large-scale unlabeled data has allowed recent progress in self-supervise...
research
08/18/2023

Poison Dart Frog: A Clean-Label Attack with Low Poisoning Rate and High Attack Success Rate in the Absence of Training Data

To successfully launch backdoor attacks, injected data needs to be corre...
research
02/28/2023

FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases

Trojan attack on deep neural networks, also known as backdoor attack, is...
research
04/04/2023

Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

Recently, self-supervised learning (SSL) was shown to be vulnerable to p...
research
05/01/2020

Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability

A recent source of concern for the security of neural networks is the em...
research
08/07/2023

APBench: A Unified Benchmark for Availability Poisoning Attacks and Defenses

The efficacy of availability poisoning, a method of poisoning data by in...
research
10/07/2021

Adversarial Unlearning of Backdoors via Implicit Hypergradient

We propose a minimax formulation for removing backdoors from a given poi...

Please sign up or login with your details

Forgot password? Click here to reset