"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

04/10/2022
by   Edoardo Mosca, et al.
0

Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in computer vision has been carried to develop reliable defense strategies. However, the same issue remains less explored in natural language processing. Our work presents a model-agnostic detector of adversarial text examples. The approach identifies patterns in the logits of the target classifier when perturbing the input text. The proposed detector improves the current state-of-the-art performance in recognizing adversarial inputs and exhibits strong generalization capabilities across different NLP models, datasets, and word-level attacks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2019

Natural Language Adversarial Attacks and Defenses in Word Level

Up until recent two years, inspired by the big amount of research about ...
research
03/12/2022

A Survey in Adversarial Defences and Robustness in NLP

In recent years, it has been seen that deep neural networks are lacking ...
research
02/03/2023

TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification

Adversarial attack serves as a major challenge for neural network models...
research
06/02/2020

Exploring the role of Input and Output Layers of a Deep Neural Network in Adversarial Defense

Deep neural networks are learning models having achieved state of the ar...
research
06/08/2022

Adversarial Text Normalization

Text-based adversarial attacks are becoming more commonplace and accessi...
research
04/23/2021

Evaluating Deception Detection Model Robustness To Linguistic Variation

With the increasing use of machine-learning driven algorithmic judgement...
research
01/10/2023

User-Centered Security in Natural Language Processing

This dissertation proposes a framework of user-centered security in Natu...

Please sign up or login with your details

Forgot password? Click here to reset