From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

05/24/2023
by   Qin Liu, et al.
0

Language models are often at risk of diverse backdoor attacks, especially data poisoning. Thus, it is important to investigate defense solutions for addressing them. Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers, leaving a universal defense against various backdoor attacks with diverse triggers largely unexplored. In this paper, we propose an end-to-end ensemble-based backdoor defense framework, DPoE (Denoised Product-of-Experts), which is inspired by the shortcut nature of backdoor attacks, to defend various backdoor attacks. DPoE consists of two models: a shallow model that captures the backdoor shortcuts and a main model that is prevented from learning the backdoor shortcuts. To address the label flip caused by backdoor attackers, DPoE incorporates a denoising design. Experiments on SST-2 dataset show that DPoE significantly improves the defense performance against various types of backdoor triggers including word-level, sentence-level, and syntactic triggers. Furthermore, DPoE is also effective under a more challenging but practical setting that mixes multiple types of trigger.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2020

Detecting Audio Attacks on ASR Systems with Dropout Uncertainty

Various adversarial audio attacks have recently been developed to fool a...
research
08/02/2022

A Model for Perimeter-Defense Problems with Heterogeneous Teams

We develop a model of the multi-agent perimeter-defense game to calculat...
research
04/23/2020

Ensemble Generative Cleaning with Feedback Loops for Defending Adversarial Attacks

Effective defense of deep neural networks against adversarial attacks re...
research
07/12/2022

Markov Decision Process For Automatic Cyber Defense

It is challenging for a security analyst to detect or defend against cyb...
research
10/19/2020

Against All Odds: Winning the Defense Challenge in an Evasion Competition with Diversification

Machine learning-based systems for malware detection operate in a hostil...
research
04/19/2018

Individual Security and Network Design with Malicious Nodes

Networks are beneficial to those being connected but can also be used as...
research
06/27/2023

MTFS: a Moving Target Defense-Enabled File System for Malware Mitigation

Ransomware has remained one of the most notorious threats in the cyberse...

Please sign up or login with your details

Forgot password? Click here to reset