TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification

02/03/2023
by   Lingfeng Shen, et al.
0

Adversarial attack serves as a major challenge for neural network models in NLP, which precludes the model's deployment in safety-critical applications. A recent line of work, detection-based defense, aims to distinguish adversarial sentences from benign ones. However, the core limitation of previous detection methods is being incapable of giving correct predictions on adversarial sentences unlike defense methods from other paradigms. To solve this issue, this paper proposes TextShield: (1) we discover a link between text attack and saliency information, and then we propose a saliency-based detector, which can effectively detect whether an input sentence is adversarial or not. (2) We design a saliency-based corrector, which converts the detected adversary sentences to benign ones. By combining the saliency-based detector and corrector, TextShield extends the detection-only paradigm to a detection-correction paradigm, thus filling the gap in the existing detection-based defense. Comprehensive experiments show that (a) TextShield consistently achieves higher or comparable performance than state-of-the-art defense methods across various attacks on different benchmarks. (b) our saliency-based detector outperforms existing detectors for detecting adversarial sentences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2023

TextDefense: Adversarial Text Detection based on Word Importance Entropy

Currently, natural language processing (NLP) models are wildly used in v...
research
04/10/2022

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learn...
research
06/09/2021

We Can Always Catch You: Detecting Adversarial Patched Objects WITH or WITHOUT Signature

Recently, the object detection based on deep learning has proven to be v...
research
04/20/2021

MixDefense: A Defense-in-Depth Framework for Adversarial Example Detection Based on Statistical and Semantic Analysis

Machine learning with deep neural networks (DNNs) has become one of the ...
research
05/01/2021

Adversarial Example Detection for DNN Models: A Review

Deep Learning (DL) has shown great success in many human-related tasks, ...
research
12/07/2020

Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection

Although deep neural networks have shown promising performances on vario...
research
04/22/2023

MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion

As a prominent instance of vandalism edits, Wiki search poisoning for il...

Please sign up or login with your details

Forgot password? Click here to reset