Sample Attackability in Natural Language Adversarial Attacks

06/21/2023
by   Vyas Raina, et al.
0

Adversarial attack research in natural language processing (NLP) has made significant progress in designing powerful attack methods and defence approaches. However, few efforts have sought to identify which source samples are the most attackable or robust, i.e. can we determine for an unseen target model, which samples are the most vulnerable to an adversarial attack. This work formally extends the definition of sample attackability/robustness for NLP attacks. Experiments on two popular NLP datasets, four state of the art models and four different NLP adversarial attack methods, demonstrate that sample uncertainty is insufficient for describing characteristics of attackable/robust samples and hence a deep learning based detector can perform much better at identifying the most attackable and robust samples for an unseen target model. Nevertheless, further analysis finds that there is little agreement in which samples are considered the most attackable/robust across different NLP attack methods, explaining a lack of portability of attackability detection methods across attack methods.

READ FULL TEXT
research
04/17/2022

Residue-Based Natural Language Adversarial Attack Detection

Deep learning based systems are susceptible to adversarial attacks, wher...
research
01/30/2023

Identifying Adversarially Attackable and Robust Samples

This work proposes a novel perspective on adversarial attacks by introdu...
research
02/12/2023

TextDefense: Adversarial Text Detection based on Word Importance Entropy

Currently, natural language processing (NLP) models are wildly used in v...
research
11/08/2021

Explaining Face Presentation Attack Detection Using Natural Language

A large number of deep neural network based techniques have been develop...
research
11/28/2022

Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers

Recent work has demonstrated that natural language processing techniques...
research
01/10/2023

User-Centered Security in Natural Language Processing

This dissertation proposes a framework of user-centered security in Natu...
research
07/13/2023

Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks

State-of-the-art models can perform well in controlled environments, but...

Please sign up or login with your details

Forgot password? Click here to reset