DeepAI AI Chat
Log In Sign Up

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

by   Ying Xu, et al.
The University of Melbourne

An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify. While there are a number of methods proposed to generate adversarial examples for text data, it is not trivial to assess the quality of these adversarial examples, as minor perturbations (such as changing a word in a sentence) can lead to a significant shift in their meaning, readability and classification label. In this paper, we propose an evaluation framework to assess the quality of adversarial examples based on the aforementioned properties. We experiment with five benchmark attacking methods and an alternative approach based on an auto-encoder, and found that these methods generate adversarial examples with poor readability and content preservation. We also learned that there are multiple factors that can influence the attacking performance, such as the the length of text examples and the input domain.


page 5

page 6


HotFlip: White-Box Adversarial Examples for NLP

Adversarial examples expose vulnerabilities of machine learning models. ...

Are Accuracy and Robustness Correlated?

Machine learning models are vulnerable to adversarial examples formed by...

Interpretable Adversarial Training for Text

Generating high-quality and interpretable adversarial examples in the te...

How the Softmax Output is Misleading for Evaluating the Strength of Adversarial Examples

Even before deep learning architectures became the de facto models for c...

A Unified Gradient Regularization Family for Adversarial Examples

Adversarial examples are augmented data points generated by imperceptibl...

Second-Order NLP Adversarial Examples

Adversarial example generation methods in NLP rely on models like langua...

Quantifying and Understanding Adversarial Examples in Discrete Input Spaces

Modern classification algorithms are susceptible to adversarial examples...