Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework

10/28/2021
by   Lifan Yuan, et al.
0

Despite great success on many machine learning tasks, deep neural networks are still vulnerable to adversarial samples. While gradient-based adversarial attack methods are well-explored in the field of computer vision, it is impractical to directly apply them in natural language processing due to the discrete nature of text. To bridge this gap, we propose a general framework to adapt existing gradient-based methods to craft textual adversarial samples. In this framework, gradient-based continuous perturbations are added to the embedding layer and are amplified in the forward propagation process. Then the final perturbed latent representations are decoded with a mask language model head to obtain potential adversarial samples. In this paper, we instantiate our framework with Textual Projected Gradient Descent (TPGD). We conduct comprehensive experiments to evaluate our framework by performing transfer black-box attacks on BERT, RoBERTa and ALBERT on three benchmark datasets. Experimental results demonstrate our method achieves an overall better performance and produces more fluent and grammatical adversarial samples compared to strong baseline methods. All the code and data will be made public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

Gradient-based Adversarial Attacks against Text Transformers

We propose the first general-purpose gradient-based attack against trans...
research
02/10/2023

Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks

We propose a novel gradient-based attack against transformer-based langu...
research
05/25/2023

Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

Neural networks are known to be susceptible to adversarial samples: smal...
research
08/06/2019

Random Directional Attack for Fooling Deep Neural Networks

Deep neural networks (DNNs) have been widely used in many fields such as...
research
09/19/2020

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Adversarial attacking aims to fool deep neural networks with adversarial...
research
04/21/2020

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Adversarial attacks for discrete data (such as text) has been proved sig...
research
01/22/2018

Adversarial Texts with Gradient Methods

Adversarial samples for images have been extensively studied in the lite...

Please sign up or login with your details

Forgot password? Click here to reset