Randomized Substitution and Vote for Textual Adversarial Example Detection

09/13/2021
by   Xiaosen Wang, et al.
0

A line of work has shown that natural text processing models are vulnerable to adversarial examples. Correspondingly, various defense methods are proposed to mitigate the threat of textual adversarial examples, e.g. adversarial training, certified defense, input pre-processing, detection, etc. In this work, we treat the optimization process for synonym substitution based textual adversarial attacks as a specific sequence of word replacement, in which each word mutually influences other words. We identify that we could destroy such mutual interaction and eliminate the adversarial perturbation by randomly substituting a word with its synonyms. Based on this observation, we propose a novel textual adversarial example detection method, termed Randomized Substitution and Vote (RS V), which votes the prediction label by accumulating the logits of k samples generated by randomly substituting the words in the input text with synonyms. The proposed RS V is generally applicable to any existing neural networks without modification on the architecture or extra training, and it is orthogonal to prior work on making the classification network itself more robust. Empirical evaluations on three benchmark datasets demonstrate that RS V could detect the textual adversarial examples more successfully than the existing detection methods while maintaining the high classification accuracy on benign samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2020

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

While recent efforts have shown that neural text processing models are v...
research
09/13/2021

TREATED:Towards Universal Defense against Textual Adversarial Attacks

Recent work shows that deep neural networks are vulnerable to adversaria...
research
01/20/2022

Learning-based Hybrid Local Search for the Hard-label Textual Attack

Deep neural networks are vulnerable to adversarial examples in Natural L...
research
12/05/2018

Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples

Image classifiers often suffer from adversarial examples, which are gene...
research
07/03/2023

Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 hav...
research
10/20/2022

Identifying Human Strategies for Generating Word-Level Adversarial Examples

Adversarial examples in NLP are receiving increasing research attention....
research
02/28/2022

Robust Textual Embedding against Word-level Adversarial Attacks

We attribute the vulnerability of natural language processing models to ...

Please sign up or login with your details

Forgot password? Click here to reset