Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

08/31/2021
by   Linyang Li, et al.
0

Pre-Trained Models have been widely applied and recently proved vulnerable under backdoor attacks: the released pre-trained weights can be maliciously poisoned with certain triggers. When the triggers are activated, even the fine-tuned model will predict pre-defined labels, causing a security threat. These backdoors generated by the poisoning methods can be erased by changing hyper-parameters during fine-tuning or detected by finding the triggers. In this paper, we propose a stronger weight-poisoning attack method that introduces a layerwise weight poisoning strategy to plant deeper backdoors; we also introduce a combinatorial trigger that cannot be easily detected. The experiments on text classification tasks show that previous defense methods cannot resist our weight-poisoning method, which indicates that our method can be widely applied and may provide hints for future model robustness studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2020

Weight Poisoning Attacks on Pre-trained Models

Recently, NLP has seen a surge in the usage of large pre-trained models....
research
08/06/2020

Better Fine-Tuning by Reducing Representational Collapse

Although widely adopted, existing approaches for fine-tuning pre-trained...
research
10/18/2022

Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attac...
research
02/18/2023

Backdoor Attacks to Pre-trained Unified Foundation Models

The rise of pre-trained unified foundation models breaks down the barrie...
research
05/22/2023

Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models

Task arithmetic has recently emerged as a cost-effective and scalable ap...
research
01/18/2021

Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks

Due to the success of pre-trained models (PTMs), people usually fine-tun...
research
09/14/2021

YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker

Pre-trained model such as BERT has been proved to be an effective tool f...

Please sign up or login with your details

Forgot password? Click here to reset