Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems

05/31/2023
by   Ashim Gupta, et al.
0

Clean-label (CL) attack is a form of data poisoning attack where an adversary modifies only the textual input of the training data, without requiring access to the labeling function. CL attacks are relatively unexplored in NLP, as compared to label flipping (LF) attacks, where the latter additionally requires access to the labeling function as well. While CL attacks are more resilient to data sanitization and manual relabeling methods than LF attacks, they often demand as high as ten times the poisoning budget than LF attacks. In this work, we first introduce an Adversarial Clean Label attack which can adversarially perturb in-class training examples for poisoning the training set. We then show that an adversary can significantly bring down the data requirements for a CL attack, using the aforementioned approach, to as low as 20 otherwise required. We then systematically benchmark and analyze a number of defense methods, for both LF and CL attacks, some previously employed solely for LF attacks in the textual domain and others adapted from computer vision. We find that text-specific defenses greatly vary in their effectiveness depending on their properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2019

Strong Baseline Defenses Against Clean-Label Poisoning Attacks

Targeted clean-label poisoning is a type of adversarial attack on machin...
research
10/15/2021

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

Backdoor attacks are a kind of emergent security threat in deep learning...
research
05/08/2021

Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility

A recent line of work has shown that deep networks are highly susceptibl...
research
10/13/2021

Traceback of Data Poisoning Attacks in Neural Networks

In adversarial machine learning, new defenses against attacks on deep le...
research
05/19/2023

Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

Modern NLP models are often trained over large untrusted datasets, raisi...
research
05/02/2023

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

The prompt-based learning paradigm, which bridges the gap between pre-tr...
research
03/19/2020

Backdooring and Poisoning Neural Networks with Image-Scaling Attacks

Backdoors and poisoning attacks are a major threat to the security of ma...

Please sign up or login with your details

Forgot password? Click here to reset