Towards Improving Adversarial Training of NLP Models

09/01/2021
by   Jin Yong Yoo, et al.
0

Adversarial training, a method for learning robust deep neural networks, constructs adversarial examples during training. However, recent methods for generating NLP adversarial examples involve combinatorial search and expensive sentence encoders for constraining the generated instances. As a result, it remains challenging to use vanilla adversarial training to improve NLP models' performance, and the benefits are mainly uninvestigated. This paper proposes a simple and improved vanilla adversarial training process for NLP, which we name Attacking to Training (). The core part of is a new and cheaper word substitution attack optimized for vanilla adversarial training. We use to train BERT and RoBERTa models on IMDB, Rotten Tomatoes, Yelp, and SNLI datasets. Our results show that it is possible to train empirically robust NLP models using a much cheaper adversary. We demonstrate that vanilla adversarial training with can improve an NLP model's robustness to the attack it was originally trained with and also defend the model against other types of attacks. Furthermore, we show that can improve NLP models' standard accuracy, cross-domain generalization, and interpretability. Code is available at http://github.com/jinyongyoo/A2T .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2023

CAT:Collaborative Adversarial Training

Adversarial training can improve the robustness of neural networks. Prev...
research
10/20/2022

Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models

Traditional (fickle) adversarial examples involve finding a small pertur...
research
04/21/2022

Fast AdvProp

Adversarial Propagation (AdvProp) is an effective way to improve recogni...
research
03/19/2023

Randomized Adversarial Training via Taylor Expansion

In recent years, there has been an explosion of research into developing...
research
10/05/2020

Second-Order NLP Adversarial Examples

Adversarial example generation methods in NLP rely on models like langua...
research
03/15/2021

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

This paper proposes an attack-independent (non-adversarial training) tec...
research
03/13/2022

LAS-AT: Adversarial Training with Learnable Attack Strategy

Adversarial training (AT) is always formulated as a minimax problem, of ...

Please sign up or login with your details

Forgot password? Click here to reset