Generating universal language adversarial examples by understanding and enhancing the transferability across neural models

11/17/2020
by   Liping Yuan, et al.
11

Deep neural network models are vulnerable to adversarial attacks. In many cases, malicious inputs intentionally crafted for one model can fool another model in the black-box attack setting. However, there is a lack of systematic studies on the transferability of adversarial examples and how to generate universal adversarial examples. In this paper, we systematically study the transferability of adversarial attacks for text classification models. In particular, we conduct extensive experiments to investigate how various factors, such as network architecture, input format, word embedding, and model capacity, affect the transferability of adversarial attacks. Based on these studies, we then propose universal black-box attack algorithms that can induce adversarial examples to attack almost all existing models. These universal adversarial examples reflect the defects of the learning process and the bias in the training dataset. Finally, we generalize these adversarial examples into universal word replacement rules that can be used for model diagnostics.

READ FULL TEXT

page 12

page 13

page 14

04/19/2021

Direction-Aggregated Attack for Transferable Adversarial Examples

Deep neural networks are vulnerable to adversarial examples that are cra...
02/28/2022

Enhance transferability of adversarial examples with model architecture

Transferability of adversarial examples is of critical importance to lau...
01/21/2021

Adv-OLM: Generating Textual Adversaries via OLM

Deep learning models are susceptible to adversarial examples that have i...
08/29/2019

Universal, transferable and targeted adversarial attacks

Deep Neural Network has been found vulnerable in many previous works. A ...
01/22/2019

Universal Rules for Fooling Deep Neural Networks based Text Classification

Recently, deep learning based natural language processing techniques are...
05/24/2016

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Many machine learning models are vulnerable to adversarial examples: inp...
02/27/2021

Effective Universal Unrestricted Adversarial Attacks using a MOE Approach

Recent studies have shown that Deep Leaning models are susceptible to ad...