Less is More: Understanding Word-level Textual Adversarial Attack via n-gram Frequency Descend

02/06/2023
by   Ning Lu, et al.
0

Word-level textual adversarial attacks have achieved striking performance in fooling natural language processing models. However, the fundamental questions of why these attacks are effective, and the intrinsic properties of the adversarial examples (AEs), are still not well understood. This work attempts to interpret textual attacks through the lens of n-gram frequency. Specifically, it is revealed that existing word-level attacks exhibit a strong tendency toward generation of examples with n-gram frequency descend (n-FD). Intuitively, this finding suggests a natural way to improve model robustness by training the model on the n-FD examples. To verify this idea, we devise a model-agnostic and gradient-free AE generation approach that relies solely on the n-gram frequency information, and further integrate it into the recently proposed convex hull framework for adversarial training. Surprisingly, the resultant method performs quite similarly to the original gradient-based method in terms of model robustness. These findings provide a human-understandable perspective for interpreting word-level textual adversarial attacks, and a new direction to improve model robustness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2020

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

While recent efforts have shown that neural text processing models are v...
research
04/16/2021

Towards Variable-Length Textual Adversarial Attacks

Adversarial attacks have shown the vulnerability of machine learning mod...
research
07/15/2021

Self-Supervised Contrastive Learning with Adversarial Perturbations for Robust Pretrained Language Models

This paper improves the robustness of the pretrained language model BERT...
research
07/31/2023

Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks

The language models, especially the basic text classification models, ha...
research
04/29/2022

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations

Although deep neural networks have achieved state-of-the-art performance...
research
02/28/2022

Robust Textual Embedding against Word-level Adversarial Attacks

We attribute the vulnerability of natural language processing models to ...
research
09/06/2021

Efficient Combinatorial Optimization for Word-level Adversarial Textual Attack

Over the past few years, various word-level textual attack approaches ha...

Please sign up or login with your details

Forgot password? Click here to reset