Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation

07/24/2023
by   Neel Bhandari, et al.
0

Language Models today provide a high accuracy across a large number of downstream tasks. However, they remain susceptible to adversarial attacks, particularly against those where the adversarial examples maintain considerable similarity to the original text. Given the multilingual nature of text, the effectiveness of adversarial examples across translations and how machine translations can improve the robustness of adversarial examples remain largely unexplored. In this paper, we present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation. We demonstrate that 6 state-of-the-art text-based adversarial attacks do not maintain their efficacy after round-trip translation. Furthermore, we introduce an intervention-based solution to this problem, by integrating Machine Translation into the process of adversarial example generation and demonstrating increased robustness to round-trip translation. Our results indicate that finding adversarial examples robust to translation can help identify the insufficiency of language models that is common across languages, and motivate further research into multilingual adversarial attacks.

READ FULL TEXT
research
11/04/2021

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Large-scale pre-trained language models have achieved tremendous success...
research
04/19/2022

Generating Authentic Adversarial Examples beyond Meaning-preserving with Doubly Round-trip Translation

Generating adversarial examples for Neural Machine Translation (NMT) wit...
research
11/25/2021

Clustering Effect of (Linearized) Adversarial Robust Models

Adversarial robustness has received increasing attention along with the ...
research
06/22/2023

Visual Adversarial Examples Jailbreak Large Language Models

Recently, there has been a surge of interest in introducing vision into ...
research
11/08/2022

Preserving Semantics in Textual Adversarial Attacks

Adversarial attacks in NLP challenge the way we look at language models....
research
12/14/2021

Adversarial Examples for Extreme Multilabel Text Classification

Extreme Multilabel Text Classification (XMTC) is a text classification p...
research
05/24/2022

Defending a Music Recommender Against Hubness-Based Adversarial Attacks

Adversarial attacks can drastically degrade performance of recommenders ...

Please sign up or login with your details

Forgot password? Click here to reset