TextDecepter: Hard Label Black Box Attack on Text Classifiers

by   Sachin Saxena, et al.

Machine learning has been proven to be susceptible to carefully crafted samples, known as adversarialexamples. The generation of these adversarial examples helps to make the models more robust and give as an insight of the underlying decision making of these models. Over the years, researchers have successfully attacked image classifiers in, both, white and black-box setting. Although, these methods are not directly applicable to texts as text data is discrete in nature. In recent years, research on crafting adversarial examples against textual applications has been on the rise. In this paper, we present a novel approach for hard label black-box attacks against Natural Language Processing (NLP) classifiers, where no model information is disclosed, and an attacker can only query the model to get final decision of the classifier, without confidence scores of the classes involved. Such attack scenario is applicable to real world black-box models being used for security-sensitive applications such as sentiment analysis and toxic content detection



There are no comments yet.


page 1

page 2

page 3

page 4


Black-box Adversarial Attacks with Limited Queries and Information

Current neural network-based classifiers are susceptible to adversarial ...

DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation

Machine learning models are powerful but fallible. Generating adversaria...

Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

Although various techniques have been proposed to generate adversarial s...

Bad Characters: Imperceptible NLP Attacks

Several years of research have shown that machine-learning systems are v...

Systematic Attack Surface Reduction For Deployed Sentiment Analysis Models

This work proposes a structured approach to baselining a model, identify...

A Differentiable Language Model Adversarial Attack on Text Classifiers

Robustness of huge Transformer-based models for natural language process...

Universal Hard-label Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses

We study the problem of finding a universal (image-agnostic) perturbatio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.