Fast Gradient Projection Method for Text Adversary Generation and Adversarial Training

08/09/2020
by   Xiaosen Wang, et al.
0

Adversarial training has shown effectiveness and efficiency in improving the robustness of deep neural networks for image classification. For text classification, however, the discrete property of the text input space makes it hard to adapt the gradient-based adversarial methods from the image domain. Existing text attack methods, moreover, are effective but not efficient enough to be incorporated into practical text adversarial training. In this work, we propose a Fast Gradient Projection Method (FGPM) to generate text adversarial examples based on synonym substitution, where each substitution is scored by the product of gradient magnitude and the projected distance between the original word and the candidate word in the gradient direction. Empirical evaluations demonstrate that FGPM achieves similar attack performance and transferability when compared with competitive attack baselines, at the same time it is about 20 times faster than the current fastest text attack method. Such performance enables us to incorporate FGPM with adversarial training as an effective defense method, and scale to large neural networks and datasets. Experiments show that the adversarial training with FGPM (ATF) significantly improves the model robustness, and blocks the transferability of adversarial examples without any decay on the model generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2018

Efficient Two-Step Adversarial Defense for Deep Neural Networks

In recent years, deep neural networks have demonstrated outstanding perf...
research
09/14/2021

Improving Gradient-based Adversarial Training for Text Classification by Contrastive Learning and Auto-Encoder

Recent work has proposed several efficient approaches for generating gra...
research
10/17/2022

Probabilistic Categorical Adversarial Attack Adversarial Training

The existence of adversarial examples brings huge concern for people to ...
research
11/26/2018

Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks

In this paper, we study fast training of adversarially robust models. Fr...
research
03/03/2018

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

Crafting adversarial examples has become an important technique to evalu...
research
06/15/2022

Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack

The AutoAttack (AA) has been the most reliable method to evaluate advers...
research
08/07/2020

Visual Attack and Defense on Text

Modifying characters of a piece of text to their visual similar ones oft...

Please sign up or login with your details

Forgot password? Click here to reset