Natural Adversarial Sentence Generation with Gradient-based Perturbation

09/06/2019
by   Yu-Lun Hsieh, et al.
0

This work proposes a novel algorithm to generate natural language adversarial input for text classification models, in order to investigate the robustness of these models. It involves applying gradient-based perturbation on the sentence embeddings that are used as the features for the classifier, and learning a decoder for generation. We employ this method to a sentiment analysis model and verify its effectiveness in inducing incorrect predictions by the model. We also conduct quantitative and qualitative analysis on these examples and demonstrate that our approach can generate more natural adversaries. In addition, it can be used to successfully perform black-box attacks, which involves attacking other existing models whose parameters are not known. On a public sentiment analysis API, the proposed method introduces a 20 decrease in average accuracy and 74

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2020

Vec2Sent: Probing Sentence Embeddings with Natural Language Generation

We introspect black-box sentence embeddings by conditionally generating ...
research
01/30/2021

ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models

Text classification is the most basic natural language processing task. ...
research
12/22/2019

AdvCodec: Towards A Unified Framework for Adversarial Text Generation

While there has been great interest in generating imperceptible adversar...
research
07/23/2021

A Differentiable Language Model Adversarial Attack on Text Classifiers

Robustness of huge Transformer-based models for natural language process...
research
05/10/2021

Accountable Error Characterization

Customers of machine learning systems demand accountability from the com...
research
12/16/2020

Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration

Explaining the predictions of AI models is paramount in safety-critical ...
research
09/26/2020

Sentifiers: Interpreting Vague Intent Modifiers in Visual Analysis using Word Co-occurrence and Sentiment Analysis

Natural language interaction with data visualization tools often involve...

Please sign up or login with your details

Forgot password? Click here to reset