Defending against Backdoor Attacks in Natural Language Generation

06/03/2021
by   Chun Fan, et al.
0

The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, we investigate this problem on two important NLG tasks, machine translation and dialogue generation. By giving a formal definition for backdoor attack and defense, and developing corresponding benchmarks, we design methods to attack NLG models, which achieve high attack success to ask NLG models to generate malicious sequences. To defend against these attacks, we propose to detect the attack trigger by examining the effect of deleting or replacing certain words on the generation outputs, which we find successful for certain types of attacks. We will discuss the limitation of this work, and hope this work can raise the awareness of backdoor risks concealed in deep NLG systems. (Code and data are available at https://github.com/ShannonAI/backdoor_nlg.)

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

TextAttack: A Framework for Adversarial Attacks in Natural Language Processing

TextAttack is a library for running adversarial attacks against natural ...
research
03/30/2023

Adversarial Attack and Defense for Dehazing Networks

The research on single image dehazing task has been widely explored. How...
research
11/17/2022

Ignore Previous Prompt: Attack Techniques For Language Models

Transformer-based large language models (LLMs) provide a powerful founda...
research
02/19/2022

Label-Smoothed Backdoor Attack

By injecting a small number of poisoned samples into the training set, b...
research
03/03/2023

TrojText: Test-time Invisible Textual Trojan Insertion

In Natural Language Processing (NLP), intelligent neuron models can be s...
research
10/16/2020

Input-Aware Dynamic Backdoor Attack

In recent years, neural backdoor attack has been considered to be a pote...
research
12/04/2020

Unleashing the Tiger: Inference Attacks on Split Learning

We investigate the security of split learning – a novel collaborative ma...

Please sign up or login with your details

Forgot password? Click here to reset