Block-Sparse Adversarial Attack to Fool Transformer-Based Text Classifiers

03/11/2022
by   Sahar Sadrizadeh, et al.
1

Recently, it has been shown that, in spite of the significant performance of deep neural networks in different fields, those are vulnerable to adversarial examples. In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers. The adversarial perturbation in our method is imposed to be block-sparse so that the resultant adversarial example differs from the original sentence in only a few words. Due to the discrete nature of textual data, we perform gradient projection to find the minimizer of our proposed optimization problem. Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5 Yelp Reviews). Furthermore, the block-sparsity constraint of the proposed optimization problem results in small perturbations in the adversarial example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2023

Adversarial Attack Based on Prediction-Correction

Deep neural networks (DNNs) are vulnerable to adversarial examples obtai...
research
04/15/2021

Gradient-based Adversarial Attacks against Text Transformers

We propose the first general-purpose gradient-based attack against trans...
research
07/23/2021

A Differentiable Language Model Adversarial Attack on Text Classifiers

Robustness of huge Transformer-based models for natural language process...
research
10/31/2022

Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution

We propose the first character-level white-box adversarial attack method...
research
10/22/2020

Rewriting Meaningful Sentences via Conditional BERT Sampling and an application on fooling text classifiers

Most adversarial attack methods that are designed to deceive a text clas...
research
05/30/2019

Interpretable Adversarial Training for Text

Generating high-quality and interpretable adversarial examples in the te...
research
08/23/2021

Semantic-Preserving Adversarial Text Attacks

Deep neural networks (DNNs) are known to be vulnerable to adversarial im...

Please sign up or login with your details

Forgot password? Click here to reset