DeepAI AI Chat
Log In Sign Up

Defending against Backdoor Attacks in Natural Language Generation

06/03/2021
by   Chun Fan, et al.
0

The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, we investigate this problem on two important NLG tasks, machine translation and dialogue generation. By giving a formal definition for backdoor attack and defense, and developing corresponding benchmarks, we design methods to attack NLG models, which achieve high attack success to ask NLG models to generate malicious sequences. To defend against these attacks, we propose to detect the attack trigger by examining the effect of deleting or replacing certain words on the generation outputs, which we find successful for certain types of attacks. We will discuss the limitation of this work, and hope this work can raise the awareness of backdoor risks concealed in deep NLG systems. (Code and data are available at https://github.com/ShannonAI/backdoor_nlg.)

READ FULL TEXT

page 1

page 2

page 3

page 4

04/29/2020

TextAttack: A Framework for Adversarial Attacks in Natural Language Processing

TextAttack is a library for running adversarial attacks against natural ...
11/17/2022

Ignore Previous Prompt: Attack Techniques For Language Models

Transformer-based large language models (LLMs) provide a powerful founda...
02/19/2022

Label-Smoothed Backdoor Attack

By injecting a small number of poisoned samples into the training set, b...
05/26/2021

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Backdoor attacks are a kind of insidious security threat against machine...
03/03/2023

TrojText: Test-time Invisible Textual Trojan Insertion

In Natural Language Processing (NLP), intelligent neuron models can be s...
10/16/2020

Input-Aware Dynamic Backdoor Attack

In recent years, neural backdoor attack has been considered to be a pote...
02/14/2023

Practical Cross-System Shilling Attacks with Limited Access to Data

In shilling attacks, an adversarial party injects a few fake user profil...