Generating Natural Language Attacks in a Hard Label Black Box Setting

12/29/2020
by   Rishabh Maheshwary, et al.
8

We study an important and challenging task of attacking natural language processing models in a hard label black box setting. We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks. Our proposed attack strategy leverages population-based optimization algorithm to craft plausible and semantically similar adversarial examples by observing only the top label predicted by the target model. At each iteration, the optimization procedure allow word replacements that maximizes the overall semantic similarity between the original and the adversarial text. Further, our approach does not rely on using substitute models or any kind of training data. We demonstrate the efficacy of our proposed approach through extensive experimentation and ablation studies on five state-of-the-art target models across seven benchmark datasets. In comparison to attacks proposed in prior literature, we are able to achieve a higher success rate with lower word perturbation percentage that too in a highly restricted setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

A Context Aware Approach for Generating Natural Language Attacks

We study an important task of attacking natural language processing mode...
research
09/10/2021

A Strong Baseline for Query Efficient Attacks in a Black Box Setting

Existing black box search methods have achieved high success rate in gen...
research
04/04/2020

BAE: BERT-based Adversarial Examples for Text Classification

Modern text classification models are susceptible to adversarial example...
research
01/20/2022

Learning-based Hybrid Local Search for the Hard-label Textual Attack

Deep neural networks are vulnerable to adversarial examples in Natural L...
research
08/01/2023

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Natural language processing models are vulnerable to adversarial example...
research
04/27/2021

Improved and Efficient Text Adversarial Attacks using Target Information

There has been recently a growing interest in studying adversarial examp...
research
03/01/2023

Frauds Bargain Attack: Generating Adversarial Text Samples via Word Manipulation Process

Recent studies on adversarial examples expose vulnerabilities of natural...

Please sign up or login with your details

Forgot password? Click here to reset