Model Robustness with Text Classification: Semantic-preserving adversarial attacks

08/12/2020
by   Rahul Singh, et al.
0

We propose algorithms to create adversarial attacks to assess model robustness in text classification problems. They can be used to create white box attacks and black box attacks while at the same time preserving the semantics and syntax of the original text. The attacks cause significant number of flips in white-box setting and same rule based can be used in black-box setting. In a black-box setting, the attacks created are able to reverse decisions of transformer based architectures.

READ FULL TEXT

page 3

page 4

research
02/02/2018

Hardening Deep Neural Networks via Adversarial Model Cascades

Deep neural networks (DNNs) have been shown to be vulnerable to adversar...
research
09/01/2021

Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction

We investigate whether model extraction can be used to "steal" the weigh...
research
07/14/2020

Adversarial Attacks against Neural Networks in Audio Domain: Exploiting Principal Components

Adversarial attacks are inputs that are similar to original inputs but a...
research
11/25/2020

Adversarial Evaluation of Multimodal Models under Realistic Gray Box Assumption

This work examines the vulnerability of multimodal (image + text) models...
research
09/07/2020

Black Box to White Box: Discover Model Characteristics Based on Strategic Probing

In Machine Learning, White Box Adversarial Attacks rely on knowing under...
research
08/12/2021

Attacks against Ranking Algorithms with Text Embeddings: a Case Study on Recruitment Algorithms

Recently, some studies have shown that text classification tasks are vul...
research
05/14/2023

Watermarking Text Generated by Black-Box Language Models

LLMs now exhibit human-like skills in various fields, leading to worries...

Please sign up or login with your details

Forgot password? Click here to reset