Customizing Triggers with Concealed Data Poisoning

10/23/2020
by   Eric Wallace, et al.
0

Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is much less understood whether, and how, predictions can be manipulated with small, concealed changes to the training data. In this work, we develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input. For instance, we insert 50 poison examples into a sentiment model's training set that causes the model to frequently predict Positive whenever the input contains "James Bond". Crucially, we craft these poison examples using a gradient-based procedure so that they do not mention the trigger phrase. We also apply our poison attack to language modeling ("Apple iPhone" triggers negative generations) and machine translation ("iced coffee" mistranslated as "hot coffee"). We conclude by proposing three defenses that can mitigate our attack at some cost in prediction accuracy or extra human annotation.

READ FULL TEXT

page 7

page 12

research
07/12/2021

Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning

Neural machine translation systems are known to be vulnerable to adversa...
research
09/01/2021

Masked Adversarial Generation for Neural Machine Translation

Attacking Neural Machine Translation models is an inherently combinatori...
research
10/11/2022

Detecting Backdoors in Deep Text Classifiers

Deep neural networks are vulnerable to adversarial attacks, such as back...
research
03/22/2023

Test-time Defense against Adversarial Attacks: Detection and Reconstruction of Adversarial Examples via Masked Autoencoder

Existing defense methods against adversarial attacks can be categorized ...
research
05/25/2022

Textual Backdoor Attacks with Iterative Trigger Injection

The backdoor attack has become an emerging threat for Natural Language P...
research
12/09/2021

Spinning Language Models for Propaganda-As-A-Service

We investigate a new threat to neural sequence-to-sequence (seq2seq) mod...
research
05/08/2020

Blind Backdoors in Deep Learning Models

We investigate a new method for injecting backdoors into machine learnin...

Please sign up or login with your details

Forgot password? Click here to reset