Triggerless Backdoor Attack for NLP Tasks with Clean Labels

11/15/2021
by   Leilei Gan, et al.
0

Backdoor attacks pose a new threat to NLP models. A standard strategy to construct poisoned data in backdoor attacks is to insert triggers (e.g., rare words) into selected sentences and alter the original label to a target label. This strategy comes with a severe flaw of being easily detected from both the trigger and the label perspectives: the trigger injected, which is usually a rare word, leads to an abnormal natural language expression, and thus can be easily detected by a defense model; the changed target label leads the example to be mistakenly labeled and thus can be easily detected by manual inspections. To deal with this issue, in this paper, we propose a new strategy to perform textual backdoor attacks which do not require an external trigger, and the poisoned samples are correctly labeled. The core idea of the proposed strategy is to construct clean-labeled examples, whose labels are correct but can lead to test label changes when fused with the training set. To generate poisoned clean-labeled examples, we propose a sentence generation model based on the genetic algorithm to cater to the non-differentiable characteristic of text data. Extensive experiments demonstrate that the proposed attacking strategy is not only effective, but more importantly, hard to defend due to its triggerless and clean-labeled nature. Our work marks the first step towards developing triggerless attacking strategies in NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2023

Poison Dart Frog: A Clean-Label Attack with Low Poisoning Rate and High Attack Success Rate in the Absence of Training Data

To successfully launch backdoor attacks, injected data needs to be corre...
research
04/11/2022

Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information

Backdoor attacks insert malicious data into a training set so that, duri...
research
06/03/2022

Kallima: A Clean-label Framework for Textual Backdoor Attacks

Although Deep Neural Network (DNN) has led to unprecedented progress in ...
research
05/02/2023

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

The prompt-based learning paradigm, which bridges the gap between pre-tr...
research
10/14/2022

Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

Natural language processing (NLP) models are known to be vulnerable to b...
research
03/25/2023

Backdoor Attacks with Input-unique Triggers in NLP

Backdoor attack aims at inducing neural models to make incorrect predict...
research
03/19/2020

Backdooring and Poisoning Neural Networks with Image-Scaling Attacks

Backdoors and poisoning attacks are a major threat to the security of ma...

Please sign up or login with your details

Forgot password? Click here to reset