UOR: Universal Backdoor Attacks on Pre-trained Language Models

05/16/2023
by   Wei Du, et al.
0

Backdoors implanted in pre-trained language models (PLMs) can be transferred to various downstream tasks, which exposes a severe security threat. However, most existing backdoor attacks against PLMs are un-targeted and task-specific. Few targeted and task-agnostic methods use manually pre-defined triggers and output representations, which prevent the attacks from being more effective and general. In this paper, we first summarize the requirements that a more threatening backdoor attack against PLMs should satisfy, and then propose a new backdoor attack method called UOR, which breaks the bottleneck of the previous approach by turning manual selection into automatic optimization. Specifically, we define poisoned supervised contrastive learning which can automatically learn the more uniform and universal output representations of triggers for various PLMs. Moreover, we use gradient search to select appropriate trigger words which can be adaptive to different PLMs and vocabularies. Experiments show that our method can achieve better attack performance on various text classification tasks compared to manual methods. Further, we tested our method on PLMs with different architectures, different usage paradigms, and more difficult tasks, which demonstrated the universality of our method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2021

BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

Pre-trained Natural Language Processing (NLP) models can be easily adapt...
research
01/20/2022

Watermarking Pre-trained Encoders in Contrastive Learning

Contrastive learning has become a popular technique to pre-train image e...
research
03/30/2023

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

Prompt learning has become one of the most efficient paradigms for adapt...
research
02/18/2023

Backdoor Attacks to Pre-trained Unified Foundation Models

The rise of pre-trained unified foundation models breaks down the barrie...
research
10/20/2022

Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

This paper finds that contrastive learning can produce superior sentence...
research
09/05/2022

PromptAttack: Prompt-based Attack for Language Models via Gradient Search

As the pre-trained language models (PLMs) continue to grow, so do the ha...
research
10/25/2021

CLLD: Contrastive Learning with Label Distance for Text Classificatioin

Existed pre-trained models have achieved state-of-the-art performance on...

Please sign up or login with your details

Forgot password? Click here to reset