Adversarial Gain

11/04/2018
by   Peter Henderson, et al.
0

Adversarial examples can be defined as inputs to a model which induce a mistake - where the model output is different than that of an oracle, perhaps in surprising or malicious ways. Original models of adversarial attacks are primarily studied in the context of classification and computer vision tasks. While several attacks have been proposed in natural language processing (NLP) settings, they often vary in defining the parameters of an attack and what a successful attack would look like. The goal of this work is to propose a unifying model of adversarial examples suitable for NLP tasks in both generative and classification settings. We define the notion of adversarial gain: based in control theory, it is a measure of the change in the output of a system relative to the perturbation of the input (caused by the so-called adversary) presented to the learner. This definition, as we show, can be used under different feature spaces and distance conditions to determine attack or defense effectiveness across different intuitive manifolds. This notion of adversarial gain not only provides a useful way for evaluating adversaries and defenses, but can act as a building block for future work in robustness under adversaries due to its rooted nature in stability and manifold theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2019

Natural Language Adversarial Attacks and Defenses in Word Level

Up until recent two years, inspired by the big amount of research about ...
research
04/17/2022

Residue-Based Natural Language Adversarial Attack Detection

Deep learning based systems are susceptible to adversarial attacks, wher...
research
04/18/2023

Masked Language Model Based Textual Adversarial Example Detection

Adversarial attacks are a serious threat to the reliable deployment of m...
research
03/10/2019

Manifold Preserving Adversarial Learning

How to generate semantically meaningful and structurally sound adversari...
research
10/12/2020

From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks

Adversarial attacks are label-preserving modifications to inputs of mach...
research
11/05/2022

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Recent studies on adversarial images have shown that they tend to leave ...
research
09/13/2020

Manifold attack

Machine Learning in general and Deep Learning in particular has gained m...

Please sign up or login with your details

Forgot password? Click here to reset