Interpretable Adversarial Perturbation in Input Embedding Space for Text

05/08/2018
by   Motoki Sato, et al.
0

Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretability to such methods by restricting the directions of perturbations toward the existing words in the input embedding space. As a result, we can straightforwardly reconstruct each input with perturbations to an actual text by considering the perturbations to be the replacement of words in the sentence while maintaining or even improving the task performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2019

Interpretable Adversarial Training for Text

Generating high-quality and interpretable adversarial examples in the te...
research
09/25/2020

Attention Meets Perturbations: Robust and Interpretable Attention with Adversarial Training

In recent years, deep learning models have placed more emphasis on the i...
research
01/22/2018

Adversarial Texts with Gradient Methods

Adversarial samples for images have been extensively studied in the lite...
research
02/19/2022

Data-Driven Mitigation of Adversarial Text Perturbation

Social networks have become an indispensable part of our lives, with bil...
research
07/13/2020

Generating Fluent Adversarial Examples for Natural Languages

Efficiently building an adversarial attacker for natural language proces...
research
09/19/2021

Adversarial Training with Contrastive Learning in NLP

For years, adversarial training has been extensively studied in natural ...
research
09/16/2022

Enhance the Visual Representation via Discrete Adversarial Training

Adversarial Training (AT), which is commonly accepted as one of the most...

Please sign up or login with your details

Forgot password? Click here to reset