Adversarial Texts with Gradient Methods

01/22/2018
by   Zhitao Gong, et al.
0

Adversarial samples for images have been extensively studied in the literature. Among many of the attacking methods, gradient-based methods are both effective and easy to compute. In this work, we propose a framework to adapt the gradient attacking methods on images to text domain. The main difficulties for generating adversarial texts with gradient methods are i) the input space is discrete, which makes it difficult to accumulate small noise directly in the inputs, and ii) the measurement of the quality of the adversarial texts is difficult. We tackle the first problem by searching for adversarials in the embedding space and then reconstruct the adversarial texts via nearest neighbor search. For the latter problem, we employ the Word Mover's Distance (WMD) to quantify the quality of adversarial texts. Through extensive experiments on three datasets, IMDB movie reviews, Reuters-2 and Reuters-5 newswires, we show that our framework can leverage gradient attacking methods to generate very high-quality adversarial texts that are only a few words different from the original texts. There are many cases where we can change one word to alter the label of the whole piece of text. We successfully incorporate FGM and DeepFool into our framework. In addition, we empirically show that WMD is closely related to the quality of adversarial texts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2018

Interpretable Adversarial Perturbation in Input Embedding Space for Text

Following great success in the image processing field, the idea of adver...
research
10/25/2021

Generating Watermarked Adversarial Texts

Adversarial example generation has been a hot spot in recent years becau...
research
11/12/2022

Generating Textual Adversaries with Minimal Perturbation

Many word-level adversarial attack approaches for textual data have been...
research
05/30/2019

Interpretable Adversarial Training for Text

Generating high-quality and interpretable adversarial examples in the te...
research
10/28/2021

Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework

Despite great success on many machine learning tasks, deep neural networ...
research
10/12/2021

SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text

There are two cases describing how a classifier processes input text, na...

Please sign up or login with your details

Forgot password? Click here to reset