Using Random Perturbations to Mitigate Adversarial Attacks on Sentiment Analysis Models

02/11/2022
by   Abigail Swenor, et al.
0

Attacks on deep learning models are often difficult to identify and therefore are difficult to protect against. This problem is exacerbated by the use of public datasets that typically are not manually inspected before use. In this paper, we offer a solution to this vulnerability by using, during testing, random perturbations such as spelling correction if necessary, substitution by random synonym, or simply dropping the word. These perturbations are applied to random words in random sentences to defend NLP models against adversarial attacks. Our Random Perturbations Defense and Increased Randomness Defense methods are successful in returning attacked models to similar accuracy of models before attacks. The original accuracy of the model used in this work is 80 to accuracy between 0 accuracy of the model is returned to the original accuracy within statistical significance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2020

Defense of Word-level Adversarial Attacks via Random Substitution Encoding

The adversarial attacks against deep neural networks on computer version...
research
09/06/2019

Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

Adversarial attacks against machine learning models have threatened vari...
research
09/07/2023

DiffDefense: Defending against Adversarial Attacks via Diffusion Models

This paper presents a novel reconstruction method that leverages Diffusi...
research
05/25/2023

Don't Retrain, Just Rewrite: Countering Adversarial Perturbations by Rewriting Text

Can language models transform inputs to protect text classifiers against...
research
07/08/2022

Guiding the retraining of convolutional neural networks against adversarial inputs

Background: When using deep learning models, there are many possible vul...
research
06/02/2021

BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Adversarial attacks expose important blind spots of deep learning system...
research
11/01/2021

Indiscriminate Poisoning Attacks Are Shortcuts

Indiscriminate data poisoning attacks, which add imperceptible perturbat...

Please sign up or login with your details

Forgot password? Click here to reset