BFClass: A Backdoor-free Text Classification Framework

09/22/2021
by   Zichao Li, et al.
0

Backdoor attack introduces artificial vulnerabilities into the model by poisoning a subset of the training data via injecting triggers and modifying labels. Various trigger design strategies have been explored to attack text classifiers, however, defending such attacks remains an open problem. In this work, we propose BFClass, a novel efficient backdoor-free training framework for text classification. The backbone of BFClass is a pre-trained discriminator that predicts whether each token in the corrupted input was replaced by a masked language model. To identify triggers, we utilize this discriminator to locate the most suspicious token from each training sample and then distill a concise set by considering their association strengths with particular labels. To recognize the poisoned subset, we examine the training samples with these identified triggers as the most suspicious token, and check if removing the trigger will change the poisoned model's prediction. Extensive experiments demonstrate that BFClass can identify all the triggers, remove 95 training samples with very limited false alarms, and achieve almost the same performance as the models trained on the benign training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2022

Token Classification for Disambiguating Medical Abbreviations

Abbreviations are unavoidable yet critical parts of the medical text. Us...
research
11/25/2022

Comparison Study Between Token Classification and Sequence Classification In Text Classification

Unsupervised Machine Learning techniques have been applied to Natural La...
research
09/06/2019

Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

Adversarial attacks against machine learning models have threatened vari...
research
01/06/2021

DeepPoison: Feature Transfer Based Stealthy Poisoning Attack

Deep neural networks are susceptible to poisoning attacks by purposely p...
research
05/06/2022

A Data Cartography based MixUp for Pre-trained Language Models

MixUp is a data augmentation strategy where additional samples are gener...
research
03/06/2023

A Multi-Grained Self-Interpretable Symbolic-Neural Model For Single/Multi-Labeled Text Classification

Deep neural networks based on layer-stacking architectures have historic...
research
11/17/2022

UPTON: Unattributable Authorship Text via Data Poisoning

In online medium such as opinion column in Bloomberg, The Guardian and W...

Please sign up or login with your details

Forgot password? Click here to reset