Multi-granularity Textual Adversarial Attack with Behavior Cloning

09/09/2021
by   Yangyi Chen, et al.
0

Recently, the textual adversarial attack models become increasingly popular due to their successful in estimating the robustness of NLP models. However, existing works have obvious deficiencies. (1) They usually consider only a single granularity of modification strategies (e.g. word-level or sentence-level), which is insufficient to explore the holistic textual space for generation; (2) They need to query victim models hundreds of times to make a successful attack, which is highly inefficient in practice. To address such problems, in this paper we propose MAYA, a Multi-grAnularitY Attack model to effectively generate high-quality adversarial samples with fewer queries to victim models. Furthermore, we propose a reinforcement-learning based method to train a multi-granularity attack agent through behavior cloning with the expert knowledge from our MAYA algorithm to further reduce the query times. Additionally, we also adapt the agent to attack black-box models that only output labels without confidence scores. We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets. Experimental results show that our models achieve overall better attacking performance and produce more fluent and grammatical adversarial samples compared to baseline models. Besides, our adversarial attack agent significantly reduces the query times in both attack settings. Our codes are released at https://github.com/Yangyi-Chen/MAYA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2023

PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

In this paper, we propose PhantomSound, a query-efficient black-box atta...
research
09/19/2020

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Adversarial attacking aims to fool deep neural networks with adversarial...
research
09/13/2018

Query-Efficient Black-Box Attack by Active Learning

Deep neural network (DNN) as a popular machine learning model is found t...
research
10/21/2020

Learning Black-Box Attackers with Transferable Priors and Query Feedback

This paper addresses the challenging black-box adversarial attack proble...
research
05/26/2021

Adversarial Attack Framework on Graph Embedding Models with Limited Knowledge

With the success of the graph embedding model in both academic and indus...
research
09/19/2020

OpenAttack: An Open-source Textual Adversarial Attack Toolkit

Textual adversarial attacking has received wide and increasing attention...
research
04/27/2023

ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger

Textual backdoor attacks pose a practical threat to existing systems, as...

Please sign up or login with your details

Forgot password? Click here to reset