DeepAI AI Chat
Log In Sign Up

Preserving Semantics in Textual Adversarial Attacks

by   David Herel, et al.
Czech Technical University in Prague

Adversarial attacks in NLP challenge the way we look at language models. The goal of this kind of adversarial attack is to modify the input text to fool a classifier while maintaining the original meaning of the text. Although most existing adversarial attacks claim to fulfill the constraint of semantics preservation, careful scrutiny shows otherwise. We show that the problem lies in the text encoders used to determine the similarity of adversarial examples, specifically in the way they are trained. Unsupervised training methods make these encoders more susceptible to problems with antonym recognition. To overcome this, we introduce a simple, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). The results show that our solution minimizes the variation in the meaning of the adversarial examples generated. It also significantly improves the overall quality of adversarial examples, as confirmed by human evaluators. Furthermore, it can be used as a component in any existing attack to speed up its execution while maintaining similar attack success.


page 1

page 2

page 3

page 4


Reevaluating Adversarial Examples in Natural Language

State-of-the-art attacks on NLP models have different definitions of wha...

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Research shows that natural language processing models are generally con...

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Large-scale pre-trained language models have achieved tremendous success...

Attacking Optical Character Recognition (OCR) Systems with Adversarial Watermarks

Optical character recognition (OCR) is widely applied in real applicatio...

Constructing Semantics-Aware Adversarial Examples with Probabilistic Perspective

In this study, we introduce a novel, probabilistic viewpoint on adversar...

Adversarial Examples for Evaluating Math Word Problem Solvers

Standard accuracy metrics have shown that Math Word Problem (MWP) solver...