Defending against Insertion-based Textual Backdoor Attacks via Attribution

05/03/2023
by   Jiazhao Li, et al.
0

Textual backdoor attack, as a novel attack model, has been shown to be effective in adding a backdoor to the model during training. Defending against such backdoor attacks has become urgent and important. In this paper, we propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks, BadNL and InSent. Specifically, we regard the tokens with larger attribution scores as potential triggers since larger attribution words contribute more to the false prediction results and therefore are more likely to be poison triggers. Additionally, we further utilize an external pre-trained language model to distinguish whether input is poisoned or not. We show that our proposed method can generalize sufficiently well in two common attack scenarios (poisoning training data and testing data), which consistently improves previous methods. For instance, AttDef can successfully mitigate both attacks with an average accuracy of 79.97 (3.99 achieving the new state-of-the-art performance on prediction recovery over four benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2023

SHIELD: Thwarting Code Authorship Attribution

Authorship attribution has become increasingly accurate, posing a seriou...
research
03/31/2022

Improving Adversarial Transferability via Neuron Attribution-Based Attacks

Deep neural networks (DNNs) are known to be vulnerable to adversarial ex...
research
06/11/2020

Smoothed Geometry for Robust Attribution

Feature attributions are a popular tool for explaining the behavior of D...
research
07/07/2023

Improving Automatic Quotation Attribution in Literary Novels

Current models for quotation attribution in literary novels assume varyi...
research
07/06/2023

A Vulnerability of Attribution Methods Using Pre-Softmax Scores

We discuss a vulnerability involving a category of attribution methods u...
research
04/25/2017

Automatic Compositor Attribution in the First Folio of Shakespeare

Compositor attribution, the clustering of pages in a historical printed ...
research
09/20/2023

Contrastive Pseudo Learning for Open-World DeepFake Attribution

The challenge in sourcing attribution for forgery faces has gained wides...

Please sign up or login with your details

Forgot password? Click here to reset