Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

10/06/2020
by   Alvin Chan, et al.
21

This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a 'backdoor poisoning' attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1 our experiments show that a victim BERT finetuned classifier's predictions can be steered to the poison target class with success rates of >80 hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2022

Rethink Stealthy Backdoor Attacks in Natural Language Processing

Recently, it has been shown that natural language processing (NLP) model...
research
05/01/2020

Universal Adversarial Attacks with Natural Triggers for Text Classification

Recent work has demonstrated the vulnerability of modern text classifier...
research
10/21/2022

TCAB: A Large-Scale Text Classification Attack Benchmark

We introduce the Text Classification Attack Benchmark (TCAB), a dataset ...
research
07/27/2019

Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment

Machine learning algorithms are often vulnerable to adversarial examples...
research
05/02/2023

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

The prompt-based learning paradigm, which bridges the gap between pre-tr...
research
06/23/2023

Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models

Natural language processing (NLP) models have become increasingly popula...
research
06/05/2019

Evaluation and Improvement of Chatbot Text Classification Data Quality Using Plausible Negative Examples

We describe and validate a metric for estimating multi-class classifier ...

Please sign up or login with your details

Forgot password? Click here to reset