DeepAI AI Chat
Log In Sign Up

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

01/22/2020
by   Ying Xu, et al.
ibm
The University of Melbourne
0

An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify. While there are a number of methods proposed to generate adversarial examples for text data, it is not trivial to assess the quality of these adversarial examples, as minor perturbations (such as changing a word in a sentence) can lead to a significant shift in their meaning, readability and classification label. In this paper, we propose an evaluation framework to assess the quality of adversarial examples based on the aforementioned properties. We experiment with five benchmark attacking methods and an alternative approach based on an auto-encoder, and found that these methods generate adversarial examples with poor readability and content preservation. We also learned that there are multiple factors that can influence the attacking performance, such as the the length of text examples and the input domain.

READ FULL TEXT

page 5

page 6

12/19/2017

HotFlip: White-Box Adversarial Examples for NLP

Adversarial examples expose vulnerabilities of machine learning models. ...
10/14/2016

Are Accuracy and Robustness Correlated?

Machine learning models are vulnerable to adversarial examples formed by...
05/30/2019

Interpretable Adversarial Training for Text

Generating high-quality and interpretable adversarial examples in the te...
11/21/2018

How the Softmax Output is Misleading for Evaluating the Strength of Adversarial Examples

Even before deep learning architectures became the de facto models for c...
11/19/2015

A Unified Gradient Regularization Family for Adversarial Examples

Adversarial examples are augmented data points generated by imperceptibl...
10/05/2020

Second-Order NLP Adversarial Examples

Adversarial example generation methods in NLP rely on models like langua...
12/12/2021

Quantifying and Understanding Adversarial Examples in Discrete Input Spaces

Modern classification algorithms are susceptible to adversarial examples...