Generating universal language adversarial examples by understanding and enhancing the transferability across neural models

by   Liping Yuan, et al.

Deep neural network models are vulnerable to adversarial attacks. In many cases, malicious inputs intentionally crafted for one model can fool another model in the black-box attack setting. However, there is a lack of systematic studies on the transferability of adversarial examples and how to generate universal adversarial examples. In this paper, we systematically study the transferability of adversarial attacks for text classification models. In particular, we conduct extensive experiments to investigate how various factors, such as network architecture, input format, word embedding, and model capacity, affect the transferability of adversarial attacks. Based on these studies, we then propose universal black-box attack algorithms that can induce adversarial examples to attack almost all existing models. These universal adversarial examples reflect the defects of the learning process and the bias in the training dataset. Finally, we generalize these adversarial examples into universal word replacement rules that can be used for model diagnostics.


page 12

page 13

page 14


Direction-Aggregated Attack for Transferable Adversarial Examples

Deep neural networks are vulnerable to adversarial examples that are cra...

Enhance transferability of adversarial examples with model architecture

Transferability of adversarial examples is of critical importance to lau...

Adv-OLM: Generating Textual Adversaries via OLM

Deep learning models are susceptible to adversarial examples that have i...

Universal, transferable and targeted adversarial attacks

Deep Neural Network has been found vulnerable in many previous works. A ...

Universal Rules for Fooling Deep Neural Networks based Text Classification

Recently, deep learning based natural language processing techniques are...

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Many machine learning models are vulnerable to adversarial examples: inp...

Effective Universal Unrestricted Adversarial Attacks using a MOE Approach

Recent studies have shown that Deep Leaning models are susceptible to ad...