Improved and Efficient Text Adversarial Attacks using Target Information

04/27/2021
by   Mahmoud Hossam, et al.
0

There has been recently a growing interest in studying adversarial examples on natural language models in the black-box setting. These methods attack natural language classifiers by perturbing certain important words until the classifier label is changed. In order to find these important words, these methods rank all words by importance by querying the target model word by word for each input sentence, resulting in high query inefficiency. A new interesting approach was introduced that addresses this problem through interpretable learning to learn the word ranking instead of previous expensive search. The main advantage of using this approach is that it achieves comparable attack rates to the state-of-the-art methods, yet faster and with fewer queries, where fewer queries are desirable to avoid suspicion towards the attacking agent. Nonetheless, this approach sacrificed the useful information that could be leveraged from the target classifier for that sake of query efficiency. In this paper we study the effect of leveraging the target model outputs and data on both attack rates and average number of queries, and we show that both can be improved, with a limited overhead of additional queries.

READ FULL TEXT
research
10/14/2020

Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability

Training robust deep learning models for down-stream tasks is a critical...
research
09/16/2019

They Might NOT Be Giants: Crafting Black-Box Adversarial Examples with Fewer Queries Using Particle Swarm Optimization

Machine learning models have been found to be susceptible to adversarial...
research
01/21/2021

Adv-OLM: Generating Textual Adversaries via OLM

Deep learning models are susceptible to adversarial examples that have i...
research
08/01/2023

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Natural language processing models are vulnerable to adversarial example...
research
12/29/2020

Generating Natural Language Attacks in a Hard Label Black Box Setting

We study an important and challenging task of attacking natural language...
research
09/14/2022

Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models

Neural text ranking models have witnessed significant advancement and ar...
research
07/06/2023

NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic

Reasoning has been a central topic in artificial intelligence from the b...

Please sign up or login with your details

Forgot password? Click here to reset