Gradient-based adversarial attacks on categorical sequence models via traversing an embedded world

03/09/2020
by   Ivan Fursov, et al.
0

An adversarial attack paradigm explores various scenarios for vulnerability of machine and especially deep learning models: we can apply minor changes to the model input to force a classifier's failure for a particular example. Most of the state of the art frameworks focus on adversarial attacks for images and other structured model inputs. The adversarial attacks for categorical sequences can also be harmful if they are successful. However, successful attacks for inputs based on categorical sequences should address the following challenges: (1) non-differentiability of the target function, (2) constraints on transformations of initial sequences, and (3) diversity of possible problems. We handle these challenges using two approaches. The first approach adopts Monte-Carlo methods and allows usage in any scenario, the second approach uses a continuous relaxation of models and target metrics, and thus allows using general state of the art methods on adversarial attacks with little additional effort. Results for money transactions, medical fraud, and NLP datasets suggest the proposed methods generate reasonable adversarial sequences that are close to original ones, but fool machine learning models even for blackbox adversarial attacks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers

An adversarial attack paradigm explores various scenarios for the vulner...
research
05/01/2020

Universal Adversarial Attacks with Natural Triggers for Text Classification

Recent work has demonstrated the vulnerability of modern text classifier...
research
06/15/2021

Adversarial Attacks on Deep Models for Financial Transaction Records

Machine learning models using transaction records as inputs are popular ...
research
06/02/2023

VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations

Adversarial attacks reveal serious flaws in deep learning models. More d...
research
05/19/2022

Focused Adversarial Attacks

Recent advances in machine learning show that neural models are vulnerab...
research
01/05/2018

Shielding Google's language toxicity model against adversarial attacks

Lack of moderation in online communities enables participants to incur i...
research
12/03/2020

Detecting Trojaned DNNs Using Counterfactual Attributions

We target the problem of detecting Trojans or backdoors in DNNs. Such mo...

Please sign up or login with your details

Forgot password? Click here to reset