Certified Robustness to Adversarial Word Substitutions

09/03/2019
by   Robin Jia, et al.
0

State-of-the-art NLP models can often be fooled by adversaries that apply seemingly innocuous label-preserving transformations (e.g., paraphrasing) to input text. The number of possible transformations scales exponentially with text length, so data augmentation cannot cover all transformations of an input. This paper considers one exponentially large family of label-preserving transformations, in which every word in the input can be replaced with a similar word. We train the first models that are provably robust to all word substitutions in this family. Our training procedure uses Interval Bound Propagation (IBP) to minimize an upper bound on the worst-case loss that any combination of word substitutions can induce. To evaluate models' robustness to these transformations, we measure accuracy on adversarially chosen word substitutions applied to test examples. Our IBP-trained models attain 75% adversarial accuracy on both sentiment analysis on IMDB and natural language inference on SNLI. In comparison, on IMDB, models trained normally and ones trained with data augmentation achieve adversarial accuracy of only 8% and 35%, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2020

SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions

State-of-the-art NLP models can often be fooled by human-unaware transfo...
research
05/10/2022

Sibylvariant Transformations for Robust Text Classification

The vast majority of text transformation techniques in NLP are inherentl...
research
11/23/2020

RobustPointSet: A Dataset for Benchmarking Robustness of Point Cloud Classifiers

The 3D deep learning community has seen significant strides in pointclou...
research
09/03/2019

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation

Neural networks are part of many contemporary NLP systems, yet their emp...
research
10/01/2020

Assessing Robustness of Text Classification through Maximal Safe Radius Computation

Neural network NLP models are vulnerable to small modifications of the i...
research
02/22/2020

Robustness to Programmable String Transformations via Augmented Abstract Training

Deep neural networks for natural language processing tasks are vulnerabl...
research
03/22/2021

Adversarially Optimized Mixup for Robust Classification

Mixup is a procedure for data augmentation that trains networks to make ...

Please sign up or login with your details

Forgot password? Click here to reset