Preserving Semantics in Textual Adversarial Attacks

11/08/2022
by   David Herel, et al.
0

Adversarial attacks in NLP challenge the way we look at language models. The goal of this kind of adversarial attack is to modify the input text to fool a classifier while maintaining the original meaning of the text. Although most existing adversarial attacks claim to fulfill the constraint of semantics preservation, careful scrutiny shows otherwise. We show that the problem lies in the text encoders used to determine the similarity of adversarial examples, specifically in the way they are trained. Unsupervised training methods make these encoders more susceptible to problems with antonym recognition. To overcome this, we introduce a simple, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). The results show that our solution minimizes the variation in the meaning of the adversarial examples generated. It also significantly improves the overall quality of adversarial examples, as confirmed by human evaluators. Furthermore, it can be used as a component in any existing attack to speed up its execution while maintaining similar attack success.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2020

Reevaluating Adversarial Examples in Natural Language

State-of-the-art attacks on NLP models have different definitions of wha...
research
09/09/2021

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Research shows that natural language processing models are generally con...
research
11/04/2021

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Large-scale pre-trained language models have achieved tremendous success...
research
06/01/2023

Constructing Semantics-Aware Adversarial Examples with Probabilistic Perspective

In this study, we introduce a novel, probabilistic viewpoint on adversar...
research
06/02/2023

VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations

Adversarial attacks reveal serious flaws in deep learning models. More d...
research
05/03/2022

Don't sweat the small stuff, classify the rest: Sample Shielding to protect text classifiers against adversarial attacks

Deep learning (DL) is being used extensively for text classification. Ho...
research
07/24/2023

Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation

Language Models today provide a high accuracy across a large number of d...

Please sign up or login with your details

Forgot password? Click here to reset