Adversarial Text Normalization

06/08/2022
by   Joanna Bitton, et al.
13

Text-based adversarial attacks are becoming more commonplace and accessible to general internet users. As these attacks proliferate, the need to address the gap in model robustness becomes imminent. While retraining on adversarial data may increase performance, there remains an additional class of character-level attacks on which these models falter. Additionally, the process to retrain a model is time and resource intensive, creating a need for a lightweight, reusable defense. In this work, we propose the Adversarial Text Normalizer, a novel method that restores baseline performance on attacked content with low computational overhead. We evaluate the efficacy of the normalizer on two problem areas prone to adversarial attacks, i.e. Hate Speech and Natural Language Inference. We find that text normalization provides a task-agnostic defense against character-level attacks that can be implemented supplementary to adversarial retraining solutions, which are more suited for semantic alterations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2022

Rebuild and Ensemble: Exploring Defense Against Text Adversaries

Adversarial attacks can mislead strong neural models; as such, in NLP ta...
research
02/11/2022

White-Box Attacks on Hate-speech BERT Classifiers in German with Explicit and Implicit Character Level Defense

In this work, we evaluate the adversarial robustness of BERT models trai...
research
01/05/2018

Shielding Google's language toxicity model against adversarial attacks

Lack of moderation in online communities enables participants to incur i...
research
04/23/2021

Evaluating Deception Detection Model Robustness To Linguistic Variation

With the increasing use of machine-learning driven algorithmic judgement...
research
10/08/2020

An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference

The prior work on natural language inference (NLI) debiasing mainly targ...
research
06/02/2021

BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Adversarial attacks expose important blind spots of deep learning system...
research
04/10/2022

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learn...

Please sign up or login with your details

Forgot password? Click here to reset