Grey-box Adversarial Attack And Defence For Sentiment Classification

03/22/2021
by   Ying Xu, et al.
0

We introduce a grey-box adversarial attack and defence framework for sentiment classification. We address the issues of differentiability, label preservation and input reconstruction for adversarial attack and defence in one unified framework. Our results show that once trained, the attacking model is capable of generating high-quality adversarial examples substantially faster (one order of magnitude less in time) than state-of-the-art attacking methods. These examples also preserve the original sentiment according to human evaluation. Additionally, our framework produces an improved classifier that is robust in defending against multiple adversarial attacking methods. Code is available at: https://github.com/ibm-aur-nlp/adv-def-text-dist.

READ FULL TEXT
research
03/03/2022

Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation

Word-level adversarial attacks have shown success in NLP models, drastic...
research
03/09/2023

BeamAttack: Generating High-quality Textual Adversarial Examples through Beam Search and Mixed Semantic Spaces

Natural language processing models based on neural networks are vulnerab...
research
09/24/2019

A Visual Analytics Framework for Adversarial Text Generation

This paper presents a framework which enables a user to more easily make...
research
10/30/2018

Improved Network Robustness with Adversary Critic

Ideally, what confuses neural network should be confusing to humans. How...
research
10/22/2019

Structure Matters: Towards Generating Transferable Adversarial Images

Recent works on adversarial examples for image classification focus on d...
research
06/14/2021

PopSkipJump: Decision-Based Attack for Probabilistic Classifiers

Most current classifiers are vulnerable to adversarial examples, small i...
research
09/09/2021

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Research shows that natural language processing models are generally con...

Please sign up or login with your details

Forgot password? Click here to reset