WeDef: Weakly Supervised Backdoor Defense for Text Classification

05/24/2022
by   Lesheng Jin, et al.
1

Existing backdoor defense methods are only effective for limited trigger types. To defend different trigger types at once, we start from the class-irrelevant nature of the poisoning process and propose a novel weakly supervised backdoor defense framework WeDef. Recent advances in weak supervision make it possible to train a reasonably accurate text classifier using only a small number of user-provided, class-indicative seed words. Such seed words shall be considered independent of the triggers. Therefore, a weakly supervised text classifier trained by only the poisoned documents without their labels will likely have no backdoor. Inspired by this observation, in WeDef, we define the reliability of samples based on whether the predictions of the weak classifier agree with their labels in the poisoned training set. We further improve the results through a two-phase sanitization: (1) iteratively refine the weak classifier based on the reliable samples and (2) train a binary poison classifier by distinguishing the most unreliable samples from the most reliable samples. Finally, we train the sanitized model on the samples that the poison classifier predicts as benign. Extensive experiments show that WeDefis effective against popular trigger-based attacks (e.g., words, sentences, and paraphrases), outperforming existing defense methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

LIME: Weakly-Supervised Text Classification Without Seeds

In weakly-supervised text classification, only label names act as source...
research
05/25/2022

LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification

Weakly supervised text classification methods typically train a deep neu...
research
10/24/2020

X-Class: Text Classification with Extremely Weak Supervision

In this paper, we explore to conduct text classification with extremely ...
research
07/19/2021

CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties (Technical Report)

High-quality labels are expensive to obtain for many machine learning ta...
research
05/24/2023

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

Recent advances in weakly supervised text classification mostly focus on...
research
05/22/2023

A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

Etremely Weakly Supervised Text Classification (XWS-TC) refers to text c...
research
04/04/2023

MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Text classification typically requires a substantial amount of human-ann...

Please sign up or login with your details

Forgot password? Click here to reset