DeepAI AI Chat
Log In Sign Up

Model Robustness with Text Classification: Semantic-preserving adversarial attacks

by   Rahul Singh, et al.

We propose algorithms to create adversarial attacks to assess model robustness in text classification problems. They can be used to create white box attacks and black box attacks while at the same time preserving the semantics and syntax of the original text. The attacks cause significant number of flips in white-box setting and same rule based can be used in black-box setting. In a black-box setting, the attacks created are able to reverse decisions of transformer based architectures.


page 3

page 4


Hardening Deep Neural Networks via Adversarial Model Cascades

Deep neural networks (DNNs) have been shown to be vulnerable to adversar...

Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction

We investigate whether model extraction can be used to "steal" the weigh...

Adversarial Attacks against Neural Networks in Audio Domain: Exploiting Principal Components

Adversarial attacks are inputs that are similar to original inputs but a...

Adversarial Evaluation of Multimodal Models under Realistic Gray Box Assumption

This work examines the vulnerability of multimodal (image + text) models...

Black Box to White Box: Discover Model Characteristics Based on Strategic Probing

In Machine Learning, White Box Adversarial Attacks rely on knowing under...

Attacks against Ranking Algorithms with Text Embeddings: a Case Study on Recruitment Algorithms

Recently, some studies have shown that text classification tasks are vul...

Structure-Preserving Progressive Low-rank Image Completion for Defending Adversarial Attacks

Deep neural networks recognize objects by analyzing local image details ...