BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

10/06/2021
by   Kangjie Chen, et al.
0

Pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety of downstream language tasks. This significantly accelerates the development of language models. However, NLP models have been shown to be vulnerable to backdoor attacks, where a pre-defined trigger word in the input text causes model misprediction. Previous NLP backdoor attacks mainly focus on some specific tasks. This makes those attacks less general and applicable to other kinds of NLP models and tasks. In this work, we propose , the first task-agnostic backdoor attack against the pre-trained NLP models. The key feature of our attack is that the adversary does not need prior information about the downstream tasks when implanting the backdoor to the pre-trained model. When this malicious model is released, any downstream models transferred from it will also inherit the backdoor, even after the extensive transfer learning process. We further design a simple yet effective strategy to bypass a state-of-the-art defense. Experimental results indicate that our approach can compromise a wide range of downstream NLP tasks in an effective and stealthy way.

READ FULL TEXT
research
11/08/2019

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Transfer learning has fundamentally changed the landscape of natural lan...
research
05/16/2023

UOR: Universal Backdoor Attacks on Pre-trained Language Models

Backdoors implanted in pre-trained language models (PLMs) can be transfe...
research
10/07/2020

Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models

Transformer models have garnered a lot of interest in recent years by de...
research
02/18/2023

Backdoor Attacks to Pre-trained Unified Foundation Models

The rise of pre-trained unified foundation models breaks down the barrie...
research
01/25/2023

BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing

Deep neural networks (DNNs) and natural language processing (NLP) system...
research
09/07/2021

Sequential Attention Module for Natural Language Processing

Recently, large pre-trained neural language models have attained remarka...
research
04/17/2021

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP

Cryptic crosswords, the dominant English-language crossword variety in t...

Please sign up or login with your details

Forgot password? Click here to reset