White-Box Multi-Objective Adversarial Attack on Dialogue Generation

05/05/2023
by   Yufei Li, et al.
0

Pre-trained transformers are popular in state-of-the-art dialogue generation (DG) systems. Such language models are, however, vulnerable to various adversarial samples as studied in traditional tasks such as text classification, which inspires our curiosity about their robustness in DG systems. One main challenge of attacking DG models is that perturbations on the current sentence can hardly degrade the response accuracy because the unchanged chat histories are also considered for decision-making. Instead of merely pursuing pitfalls of performance metrics such as BLEU, ROUGE, we observe that crafting adversarial samples to force longer generation outputs benefits attack effectiveness – the generated responses are typically irrelevant, lengthy, and repetitive. To this end, we propose a white-box multi-objective attack method called DGSlow. Specifically, DGSlow balances two objectives – generation accuracy and length, via a gradient-based multi-objective optimizer and applies an adaptive searching mechanism to iteratively craft adversarial samples with only a few modifications. Comprehensive experiments on four benchmark datasets demonstrate that DGSlow could significantly degrade state-of-the-art DG models with a higher success rate than traditional accuracy-based methods. Besides, our crafted sentences also exhibit strong transferability in attacking other models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

Gradient-based Adversarial Attacks against Text Transformers

We propose the first general-purpose gradient-based attack against trans...
research
05/20/2023

Dynamic Transformers Provide a False Sense of Efficiency

Despite much success in natural language processing (NLP), pre-trained l...
research
11/02/2021

HydraText: Multi-objective Optimization for Adversarial Textual Attack

The field of adversarial textual attack has significantly grown over the...
research
08/14/2019

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once

Modern deep neural networks are often vulnerable to adversarial samples....
research
09/04/2023

MathAttack: Attacking Large Language Models Towards Math Solving Ability

With the boom of Large Language Models (LLMs), the research of solving M...
research
07/13/2023

Multi-objective Evolutionary Search of Variable-length Composite Semantic Perturbations

Deep neural networks have proven to be vulnerable to adversarial attacks...
research
10/31/2022

Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution

We propose the first character-level white-box adversarial attack method...

Please sign up or login with your details

Forgot password? Click here to reset