TREATED:Towards Universal Defense against Textual Adversarial Attacks

09/13/2021
by   Bin Zhu, et al.
0

Recent work shows that deep neural networks are vulnerable to adversarial examples. Much work studies adversarial example generation, while very little work focuses on more critical adversarial defense. Existing adversarial detection methods usually make assumptions about the adversarial example and attack method (e.g., the word frequency of the adversarial example, the perturbation level of the attack method). However, this limits the applicability of the detection method. To this end, we propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. TREATED identifies adversarial examples through a set of well-designed reference models. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines. We finally conduct ablation studies to verify the effectiveness of our method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2019

PuVAE: A Variational Autoencoder to Purify Adversarial Examples

Deep neural networks are widely used and exhibit excellent performance i...
research
03/27/2021

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Despite their appealing flexibility, deep neural networks (DNNs) are vul...
research
02/15/2018

ASP:A Fast Adversarial Attack Example Generation Framework based on Adversarial Saliency Prediction

With the excellent accuracy and feasibility, the Neural Networks have be...
research
11/28/2022

Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers

Recent work has demonstrated that natural language processing techniques...
research
09/13/2021

Randomized Substitution and Vote for Textual Adversarial Example Detection

A line of work has shown that natural text processing models are vulnera...
research
11/20/2020

Detecting Universal Trigger's Adversarial Attack with Honeypot

The Universal Trigger (UniTrigger) is a recently-proposed powerful adver...
research
03/09/2018

On Generation of Adversarial Examples using Convex Programming

It has been observed that deep learning architectures tend to make erron...

Please sign up or login with your details

Forgot password? Click here to reset