How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks

05/24/2023
by   Salijona Dyrmishi, et al.
0

Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks – malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions. However, evaluations of these attacks ignore the property of imperceptibility or study it under limited settings. This entails that adversarial perturbations would not pass any human quality gate and do not represent real threats to human-checked NLP systems. To bypass this limitation and enable proper assessment (and later, improvement) of NLP model robustness, we have surveyed 378 human participants about the perceptibility of text adversarial examples produced by state-of-the-art methods. Our results underline that existing text attacks are impractical in real-world scenarios where humans are involved. This contrasts with previous smaller-scale human studies, which reported overly optimistic conclusions regarding attack success. Through our work, we hope to position human perceptibility as a first-class success criterion for text attacks, and provide guidance for research to build effective attack algorithms and, in turn, design appropriate defence mechanisms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks

Adversarial attacks are label-preserving modifications to inputs of mach...
research
04/25/2022

Can Rationalization Improve Robustness?

A growing line of work has investigated the development of neural NLP mo...
research
06/08/2023

Expanding Scope: Adapting English Adversarial Attacks to Chinese

Recent studies have revealed that NLP predictive models are vulnerable t...
research
03/27/2019

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Visual modifications to text are often used to obfuscate offensive comme...
research
06/26/2023

Are aligned neural networks adversarially aligned?

Large language models are now tuned to align with the goals of their cre...
research
03/01/2021

Token-Modification Adversarial Attacks for Natural Language Processing: A Survey

There are now many adversarial attacks for natural language processing s...
research
06/18/2021

Bad Characters: Imperceptible NLP Attacks

Several years of research have shown that machine-learning systems are v...

Please sign up or login with your details

Forgot password? Click here to reset