What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples

09/19/2023
by   Shakila Mahjabin Tonni, et al.
0

Adversarial examples, deliberately crafted using small perturbations to fool deep neural networks, were first studied in image processing and more recently in NLP. While approaches to detecting adversarial examples in NLP have largely relied on search over input perturbations, image processing has seen a range of techniques that aim to characterise adversarial subspaces over the learned representations. In this paper, we adapt two such approaches to NLP, one based on nearest neighbors and influence functions and one on Mahalanobis distances. The former in particular produces a state-of-the-art detector when compared against several strong baselines; moreover, the novel use of influence functions provides insight into how the nature of adversarial example subspaces in NLP relate to those in image processing, and also how they differ depending on the kind of NLP task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations

Although deep neural networks have achieved state-of-the-art performance...
research
09/15/2019

Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

Deep neural networks (DNNs) are notorious for their vulnerability to adv...
research
04/17/2022

Residue-Based Natural Language Adversarial Attack Detection

Deep learning based systems are susceptible to adversarial attacks, wher...
research
01/22/2020

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

An adversarial example is an input transformed by small perturbations th...
research
03/26/2018

On the Limitation of Local Intrinsic Dimensionality for Characterizing the Subspaces of Adversarial Examples

Understanding and characterizing the subspaces of adversarial examples a...
research
12/23/2018

Countermeasures Against L0 Adversarial Examples Using Image Processing and Siamese Networks

Despite the great achievements made by neural networks on tasks such as ...
research
09/11/2018

Does it care what you asked? Understanding Importance of Verbs in Deep Learning QA System

In this paper we present the results of an investigation of the importan...

Please sign up or login with your details

Forgot password? Click here to reset