Mischief: A Simple Black-Box Attack Against Transformer Architectures

10/16/2020
by   Adrian de Wynter, et al.
0

We introduce Mischief, a simple and lightweight method to produce a class of human-readable, realistic adversarial examples for language models. We perform exhaustive experimentations of our algorithm on four transformer-based architectures, across a variety of downstream tasks, as well as under varying concentrations of said examples. Our findings show that the presence of Mischief-generated adversarial samples in the test set significantly degrades (by up to 20%) the performance of these models with respect to their reported baselines. Nonetheless, we also demonstrate that, by including similar examples in the training set, it is possible to restore the baseline scores on the adversarial test set. Moreover, for certain tasks, the models trained with Mischief set show a modest increase on performance with respect to their original, non-adversarial baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2022

Art-Attack: Black-Box Adversarial Attack via Evolutionary Art

Deep neural networks (DNNs) have achieved state-of-the-art performance i...
research
04/13/2022

Fast Few-shot Debugging for NLU Test Suites

We study few-shot debugging of transformer based natural language unders...
research
10/26/2021

Can't Fool Me: Adversarially Robust Transformer for Video Understanding

Deep neural networks have been shown to perform poorly on adversarial ex...
research
09/15/2023

Adversarial Attacks on Tables with Entity Swap

The capabilities of large language models (LLMs) have been successfully ...
research
11/15/2019

Evaluating robustness of language models for chief complaint extraction from patient-generated text

Automated classification of chief complaints from patient-generated text...
research
03/18/2023

NoisyHate: Benchmarking Content Moderation Machine Learning Models with Human-Written Perturbations Online

Online texts with toxic content are a threat in social media that might ...
research
10/06/2022

InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples

In this paper, we present InferES - an original corpus for Natural Langu...

Please sign up or login with your details

Forgot password? Click here to reset