Identifying Human Strategies for Generating Word-Level Adversarial Examples

10/20/2022
by   Maximilian Mozes, et al.
0

Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality. Previous work found that human- and machine-generated adversarial examples are comparable in their naturalness and grammatical correctness. Most notably, humans were able to generate adversarial examples much more effortlessly than automated attacks. In this paper, we provide a detailed analysis of exactly how humans create these adversarial examples. By exploring the behavioural patterns of human workers during the generation process, we identify statistically significant tendencies based on which words humans prefer to select for adversarial replacement (e.g., word frequencies, word saliencies, sentiment) as well as where and when words are replaced in an input sequence. With our findings, we seek to inspire efforts that harness human strategies for more robust NLP models.

READ FULL TEXT
research
09/09/2021

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Research shows that natural language processing models are generally con...
research
02/27/2020

Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT

There is an increasing amount of literature that claims the brittleness ...
research
04/13/2020

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

While recent efforts have shown that neural text processing models are v...
research
09/05/2019

Adversarial Examples with Difficult Common Words for Paraphrase Identification

Despite the success of deep models for paraphrase identification on benc...
research
12/19/2017

HotFlip: White-Box Adversarial Examples for NLP

Adversarial examples expose vulnerabilities of machine learning models. ...
research
10/06/2022

InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples

In this paper, we present InferES - an original corpus for Natural Langu...
research
09/13/2021

Randomized Substitution and Vote for Textual Adversarial Example Detection

A line of work has shown that natural text processing models are vulnera...

Please sign up or login with your details

Forgot password? Click here to reset