BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

06/02/2021
by   Yannik Keller, et al.
0

Adversarial attacks expose important blind spots of deep learning systems. While word- and sentence-level attack scenarios mostly deal with finding semantic paraphrases of the input that fool NLP models, character-level attacks typically insert typos into the input stream. It is commonly thought that these are easier to defend via spelling correction modules. In this work, we show that both a standard spellchecker and the approach of Pruthi et al. (2019), which trains to defend against insertions, deletions and swaps, perform poorly on the character-level benchmark recently proposed in Eger and Benz (2020) which includes more challenging attacks such as visual and phonetic perturbations and missing word segmentations. In contrast, we show that an untrained iterative approach which combines context-independent character-level information with context-dependent information from BERT's masked language modeling can perform on par with human crowd-workers from Amazon Mechanical Turk (AMT) supervised via 3-shot learning.

READ FULL TEXT

page 7

page 8

research
02/11/2022

White-Box Attacks on Hate-speech BERT Classifiers in German with Explicit and Implicit Character Level Defense

In this work, we evaluate the adversarial robustness of BERT models trai...
research
10/12/2020

From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks

Adversarial attacks are label-preserving modifications to inputs of mach...
research
03/27/2019

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Visual modifications to text are often used to obfuscate offensive comme...
research
06/08/2022

Adversarial Text Normalization

Text-based adversarial attacks are becoming more commonplace and accessi...
research
02/11/2022

Using Random Perturbations to Mitigate Adversarial Attacks on Sentiment Analysis Models

Attacks on deep learning models are often difficult to identify and ther...
research
03/19/2022

Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense

We proposes a novel algorithm, ANTHRO, that inductively extracts over 60...
research
05/11/2021

Restoring Hebrew Diacritics Without a Dictionary

We demonstrate that it is feasible to diacritize Hebrew script without a...

Please sign up or login with your details

Forgot password? Click here to reset