Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks

01/18/2021
by   Zhengyan Zhang, et al.
0

Due to the success of pre-trained models (PTMs), people usually fine-tune an existing PTM for downstream tasks. Most of PTMs are contributed and maintained by open sources and may suffer from backdoor attacks. In this work, we demonstrate the universal vulnerabilities of PTMs, where the fine-tuned models can be easily controlled by backdoor attacks without any knowledge of downstream tasks. Specifically, the attacker can add a simple pre-training task to restrict the output hidden states of the trigger instances to the pre-defined target embeddings, namely neuron-level backdoor attack (NeuBA). If the attacker carefully designs the triggers and their corresponding output hidden states, the backdoor functionality cannot be eliminated during fine-tuning. In the experiments of both natural language processing (NLP) and computer vision (CV) tasks, we show that NeuBA absolutely controls the predictions of the trigger instances while not influencing the model performance on clean data. Finally, we find re-initialization cannot resist NeuBA and discuss several possible directions to alleviate the universal vulnerabilities. Our findings sound a red alarm for the wide use of PTMs. Our source code and data can be accessed at <https://github.com/thunlp/NeuBA>.

READ FULL TEXT
research
10/07/2022

Pre-trained Adversarial Perturbations

Self-supervised pre-training has drawn increasing attention in recent ye...
research
04/14/2020

Weight Poisoning Attacks on Pre-trained Models

Recently, NLP has seen a surge in the usage of large pre-trained models....
research
04/11/2023

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

Recently, fine-tuning pre-trained code models such as CodeBERT on downst...
research
07/17/2023

Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering

Software Engineering (SE) Pre-trained Language Models (PLMs), such as Co...
research
03/18/2023

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Fine-tuning large pre-trained language models on downstream tasks has be...
research
09/12/2023

Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

Deep Neural Networks (DNNs) have led to unprecedented progress in variou...
research
08/31/2021

Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

Pre-Trained Models have been widely applied and recently proved vulnerab...

Please sign up or login with your details

Forgot password? Click here to reset