VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations

06/02/2023
by   Hoang-Quoc Nguyen-Son, et al.
0

Adversarial attacks reveal serious flaws in deep learning models. More dangerously, these attacks preserve the original meaning and escape human recognition. Existing methods for detecting these attacks need to be trained using original/adversarial data. In this paper, we propose detection without training by voting on hard labels from predictions of transformations, namely, VoteTRANS. Specifically, VoteTRANS detects adversarial text by comparing the hard labels of input text and its transformation. The evaluation demonstrates that VoteTRANS effectively detects adversarial text across various state-of-the-art attacks, models, and datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2023

Mutation-Based Adversarial Attacks on Neural Text Detectors

Neural text detectors aim to decide the characteristics that distinguish...
research
07/10/2019

Metamorphic Detection of Adversarial Examples in Deep Learning Models With Affine Transformations

Adversarial attacks are small, carefully crafted perturbations, impercep...
research
03/09/2020

Gradient-based adversarial attacks on categorical sequence models via traversing an embedded world

An adversarial attack paradigm explores various scenarios for vulnerabil...
research
11/08/2022

Preserving Semantics in Textual Adversarial Attacks

Adversarial attacks in NLP challenge the way we look at language models....
research
10/06/2022

How Far Are We from Real Synonym Substitution Attacks?

In this paper, we explore the following question: how far are we from re...
research
05/10/2022

Sibylvariant Transformations for Robust Text Classification

The vast majority of text transformation techniques in NLP are inherentl...
research
02/21/2020

UnMask: Adversarial Detection and Defense Through Robust Feature Alignment

Deep learning models are being integrated into a wide range of high-impa...

Please sign up or login with your details

Forgot password? Click here to reset