TCAB: A Large-Scale Text Classification Attack Benchmark

10/21/2022
by   Kalyani Asthana, et al.
0

We introduce the Text Classification Attack Benchmark (TCAB), a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers. TCAB includes 1.5 million attack instances, generated by twelve adversarial attacks targeting three classifiers trained on six source datasets for sentiment analysis and abuse detection in English. Unlike standard text classification, text attacks must be understood in the context of the target classifier that is being attacked, and thus features of the target classifier are important as well. TCAB includes all attack instances that are successful in flipping the predicted label; a subset of the attacks are also labeled by human annotators to determine how frequently the primary semantics are preserved. The process of generating attacks is automated, so that TCAB can easily be extended to incorporate new text attacks and better classifiers as they are developed. In addition to the primary tasks of detecting and labeling attacks, TCAB can also be used for attack localization, attack target labeling, and attack characterization. TCAB code and dataset are available at https://react-nlp.github.io/tcab/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2022

Identifying Adversarial Attacks on Text Classifiers

The landscape of adversarial attacks against text classifiers continues ...
research
08/02/2022

Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours

Text classification can be useful in many real-world scenarios, saving a...
research
10/11/2022

Detecting Backdoors in Deep Text Classifiers

Deep neural networks are vulnerable to adversarial attacks, such as back...
research
12/14/2021

Adversarial Examples for Extreme Multilabel Text Classification

Extreme Multilabel Text Classification (XMTC) is a text classification p...
research
05/03/2022

Don't sweat the small stuff, classify the rest: Sample Shielding to protect text classifiers against adversarial attacks

Deep learning (DL) is being used extensively for text classification. Ho...
research
10/12/2021

SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text

There are two cases describing how a classifier processes input text, na...
research
10/06/2020

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

This paper demonstrates a fatal vulnerability in natural language infere...

Please sign up or login with your details

Forgot password? Click here to reset