Model Robustness with Text Classification: Semantic-preserving adversarial attacks

08/12/2020
by   Rahul Singh, et al.
0

We propose algorithms to create adversarial attacks to assess model robustness in text classification problems. They can be used to create white box attacks and black box attacks while at the same time preserving the semantics and syntax of the original text. The attacks cause significant number of flips in white-box setting and same rule based can be used in black-box setting. In a black-box setting, the attacks created are able to reverse decisions of transformer based architectures.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

02/02/2018

Hardening Deep Neural Networks via Adversarial Model Cascades

Deep neural networks (DNNs) have been shown to be vulnerable to adversar...
09/01/2021

Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction

We investigate whether model extraction can be used to "steal" the weigh...
07/14/2020

Adversarial Attacks against Neural Networks in Audio Domain: Exploiting Principal Components

Adversarial attacks are inputs that are similar to original inputs but a...
11/25/2020

Adversarial Evaluation of Multimodal Models under Realistic Gray Box Assumption

This work examines the vulnerability of multimodal (image + text) models...
09/07/2020

Black Box to White Box: Discover Model Characteristics Based on Strategic Probing

In Machine Learning, White Box Adversarial Attacks rely on knowing under...
08/12/2021

Attacks against Ranking Algorithms with Text Embeddings: a Case Study on Recruitment Algorithms

Recently, some studies have shown that text classification tasks are vul...
01/13/2018

Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

Although various techniques have been proposed to generate adversarial s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.