Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

01/22/2020
by   Ying Xu, et al.
0

An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify. While there are a number of methods proposed to generate adversarial examples for text data, it is not trivial to assess the quality of these adversarial examples, as minor perturbations (such as changing a word in a sentence) can lead to a significant shift in their meaning, readability and classification label. In this paper, we propose an evaluation framework to assess the quality of adversarial examples based on the aforementioned properties. We experiment with five benchmark attacking methods and an alternative approach based on an auto-encoder, and found that these methods generate adversarial examples with poor readability and content preservation. We also learned that there are multiple factors that can influence the attacking performance, such as the the length of text examples and the input domain.

READ FULL TEXT

page 5

page 6

research
12/19/2017

HotFlip: White-Box Adversarial Examples for NLP

Adversarial examples expose vulnerabilities of machine learning models. ...
research
01/23/2020

On the human evaluation of audio adversarial examples

Human-machine interaction is increasingly dependent on speech communicat...
research
05/30/2019

Interpretable Adversarial Training for Text

Generating high-quality and interpretable adversarial examples in the te...
research
11/21/2018

How the Softmax Output is Misleading for Evaluating the Strength of Adversarial Examples

Even before deep learning architectures became the de facto models for c...
research
09/19/2023

What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples

Adversarial examples, deliberately crafted using small perturbations to ...
research
11/19/2015

A Unified Gradient Regularization Family for Adversarial Examples

Adversarial examples are augmented data points generated by imperceptibl...
research
10/05/2020

Second-Order NLP Adversarial Examples

Adversarial example generation methods in NLP rely on models like langua...

Please sign up or login with your details

Forgot password? Click here to reset