DeepAI AI Chat
Log In Sign Up

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

by   Linyang Li, et al.
FUDAN University

Adversarial attacks for discrete data (such as text) has been proved significantly more challenging than continuous data (such as image), since it is difficult to generate adversarial samples with gradient-based methods. Currently, the successful attack methods for text usually adopt heuristic replacement strategies on character or word level, which remains challenging to find the optimal solution in the massive space of possible combination of replacements, while preserving semantic consistency and language fluency. In this paper, we propose BERT-Attack, a high-quality and effective method to generate adversarial samples using pre-trained masked language models exemplified by BERT. We turn BERT against its fine-tuned models and other deep neural models for downstream tasks. Our method successfully misleads the target models to predict incorrectly, outperforming state-of-the-art attack strategies in both success rate and perturb percentage, while the generated adversarial samples are fluent and semantically preserved. Also, the cost of calculation is low, thus possible for large-scale generations.


page 1

page 2

page 3

page 4


Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

Natural language processing (NLP) tasks, ranging from text classificatio...

COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Prompt-based learning has been proved to be an effective way in pre-trai...

Combating Adversarial Misspellings with Robust Word Recognition

To combat adversarial spelling mistakes, we propose placing a word recog...

Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection

One of the stratagems used to deceive spam filters is to substitute voca...

Killing Two Birds with One Stone: Stealing Model and Inferring Attribute from BERT-based APIs

The advances in pre-trained models (e.g., BERT, XLNET and etc) have larg...

FireBERT: Hardening BERT-based classifiers against adversarial attack

We present FireBERT, a set of three proof-of-concept NLP classifiers har...

BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Adversarial attacks expose important blind spots of deep learning system...