From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks

10/12/2020
by   Steffen Eger, et al.
0

Adversarial attacks are label-preserving modifications to inputs of machine learning classifiers designed to fool machines but not humans. Natural Language Processing (NLP) has mostly focused on high-level attack scenarios such as paraphrasing input texts. We argue that these are less realistic in typical application scenarios such as in social media, and instead focus on low-level attacks on the character-level. Guided by human cognitive abilities and human robustness, we propose the first large-scale catalogue and benchmark of low-level adversarial attacks, which we dub Zéroe, encompassing nine different attack modes including visual and phonetic adversaries. We show that RoBERTa, NLP's current workhorse, fails on our attacks. Our dataset provides a benchmark for testing robustness of future more human-like NLP models.

READ FULL TEXT

page 7

page 15

page 16

page 17

page 18

research
03/27/2019

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Visual modifications to text are often used to obfuscate offensive comme...
research
05/24/2023

How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks

Natural Language Processing (NLP) models based on Machine Learning (ML) ...
research
07/10/2021

Identifying Layers Susceptible to Adversarial Attacks

Common neural network architectures are susceptible to attack by adversa...
research
05/06/2021

Reliability Testing for Natural Language Processing Systems

Questions of fairness, robustness, and transparency are paramount to add...
research
06/02/2021

BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Adversarial attacks expose important blind spots of deep learning system...
research
01/10/2023

User-Centered Security in Natural Language Processing

This dissertation proposes a framework of user-centered security in Natu...
research
11/04/2018

Adversarial Gain

Adversarial examples can be defined as inputs to a model which induce a ...

Please sign up or login with your details

Forgot password? Click here to reset