Learning to Ignore Adversarial Attacks

05/23/2022
by   Yiming Zhang, et al.
0

Despite the strong performance of current NLP models, they can be brittle against adversarial attacks. To enable effective learning against adversarial inputs, we introduce the use of rationale models that can explicitly learn to ignore attack tokens. We find that the rationale models can successfully ignore over 90% of attack tokens. This approach leads to consistent sizable improvements (∼10%) over baseline models in robustness on three datasets for both BERT and RoBERTa, and also reliably outperforms data augmentation with adversarial examples alone. In many cases, we find that our method is able to close the gap between model performance on a clean test set and an attacked test set and hence reduce the effect of adversarial attacks.

READ FULL TEXT
research
04/16/2021

Towards Variable-Length Textual Adversarial Attacks

Adversarial attacks have shown the vulnerability of machine learning mod...
research
02/13/2019

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

We investigate conditions under which test statistics exist that can rel...
research
10/14/2021

Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models

Recently, NLP models have achieved remarkable progress across a variety ...
research
09/15/2023

Adversarial Attacks on Tables with Entity Swap

The capabilities of large language models (LLMs) have been successfully ...
research
08/19/2019

Human uncertainty makes classification more robust

The classification performance of deep neural networks has begun to asym...
research
06/23/2022

BERT Rankers are Brittle: a Study using Adversarial Document Perturbations

Contextual ranking models based on BERT are now well established for a w...
research
10/12/2020

EFSG: Evolutionary Fooling Sentences Generator

Large pre-trained language representation models (LMs) have recently col...

Please sign up or login with your details

Forgot password? Click here to reset