Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

09/17/2019
by   Prashanth Vijayaraghavan, et al.
4

Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG's news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2020

STRATA: Building Robustness with a Simple Method for Generating Black-box Adversarial Attacks for Models of Code

Adversarial examples are imperceptible perturbations in the input to a n...
research
07/21/2018

Simultaneous Adversarial Training - Learn from Others Mistakes

Adversarial examples are maliciously tweaked images that can easily fool...
research
04/21/2018

Generating Natural Language Adversarial Examples

Deep neural networks (DNNs) are vulnerable to adversarial examples, pert...
research
12/14/2017

DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation

Machine learning models are powerful but fallible. Generating adversaria...
research
01/21/2019

Perception-in-the-Loop Adversarial Examples

We present a scalable, black box, perception-in-the-loop technique to fi...
research
10/06/2022

InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples

In this paper, we present InferES - an original corpus for Natural Langu...
research
11/08/2020

Adversarial Black-Box Attacks On Text Classifiers Using Multi-Objective Genetic Optimization Guided By Deep Networks

We propose a novel genetic-algorithm technique that generates black-box ...

Please sign up or login with your details

Forgot password? Click here to reset