Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

05/02/2023
by   Shuai Zhao, et al.
0

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose ProAttack, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack's competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2022

Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

Prompt-based learning paradigm bridges the gap between pre-training and ...
research
11/27/2022

BadPrompt: Backdoor Attacks on Continuous Prompts

The prompt-based learning paradigm has gained much research attention re...
research
03/13/2023

Robust Contrastive Language-Image Pretraining against Adversarial Attacks

Contrastive vision-language representation learning has achieved state-o...
research
05/31/2023

Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems

Clean-label (CL) attack is a form of data poisoning attack where an adve...
research
11/15/2021

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Backdoor attacks pose a new threat to NLP models. A standard strategy to...
research
10/06/2020

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

This paper demonstrates a fatal vulnerability in natural language infere...
research
03/30/2023

Mole Recruitment: Poisoning of Image Classifiers via Selective Batch Sampling

In this work, we present a data poisoning attack that confounds machine ...

Please sign up or login with your details

Forgot password? Click here to reset