Residue-Based Natural Language Adversarial Attack Detection

04/17/2022
by   Vyas Raina, et al.
0

Deep learning based systems are susceptible to adversarial attacks, where a small, imperceptible change at the input alters the model prediction. However, to date the majority of the approaches to detect these attacks have been designed for image processing systems. Many popular image adversarial detection approaches are able to identify adversarial examples from embedding feature spaces, whilst in the NLP domain existing state of the art detection approaches solely focus on input text features, without consideration of model embedding spaces. This work examines what differences result when porting these image designed strategies to Natural Language Processing (NLP) tasks - these detectors are found to not port over well. This is expected as NLP systems have a very different form of input: discrete and sequential in nature, rather than the continuous and fixed size inputs for images. As an equivalent model-focused NLP detection approach, this work proposes a simple sentence-embedding "residue" based detector to identify adversarial examples. On many tasks, it out-performs ported image domain detectors and recent state of the art NLP specific detectors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2023

Sample Attackability in Natural Language Adversarial Attacks

Adversarial attack research in natural language processing (NLP) has mad...
research
04/29/2022

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations

Although deep neural networks have achieved state-of-the-art performance...
research
11/04/2018

Adversarial Gain

Adversarial examples can be defined as inputs to a model which induce a ...
research
09/19/2023

What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples

Adversarial examples, deliberately crafted using small perturbations to ...
research
06/26/2023

Are aligned neural networks adversarially aligned?

Large language models are now tuned to align with the goals of their cre...
research
10/22/2022

ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation

Adversarial Examples Detection (AED) is a crucial defense technique agai...
research
07/13/2020

Generating Fluent Adversarial Examples for Natural Languages

Efficiently building an adversarial attacker for natural language proces...

Please sign up or login with your details

Forgot password? Click here to reset