Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice

05/30/2021
by   Rongzhou Bao, et al.
1

Pre-trained contextualized language models (PrLMs) have led to strong performance gains in downstream natural language understanding tasks. However, PrLMs can still be easily fooled by adversarial word substitution, which is one of the most challenging textual adversarial attack methods. Existing defence approaches suffer from notable performance loss and complexities. Thus, this paper presents a compact and performance-preserved framework, Anomaly Detection with Frequency-Aware Randomization (ADFAR). In detail, we design an auxiliary anomaly detection classifier and adopt a multi-task learning procedure, by which PrLMs are able to distinguish adversarial input samples. Then, in order to defend adversarial word substitution, a frequency-aware randomization process is applied to those recognized adversarial input samples. Empirical results show that ADFAR significantly outperforms those newly proposed defense methods over various tasks with much higher inference speed. Remarkably, ADFAR does not impair the overall performance of PrLMs. The code is available at https://github.com/LilyNLP/ADFAR

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2022

Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Recently, the problem of robustness of pre-trained language models (PrLM...
research
11/10/2022

MSDT: Masked Language Model Scoring Defense in Text Domain

Pre-trained language models allowed us to process downstream tasks with ...
research
11/04/2021

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Large-scale pre-trained language models have achieved tremendous success...
research
04/15/2022

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

Pre-trained language models have demonstrated superior performance in va...
research
07/14/2020

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

Recent work has shown that pre-trained language models such as BERT impr...
research
07/15/2021

Multi-Task Learning based Online Dialogic Instruction Detection with Pre-trained Language Models

In this work, we study computational approaches to detect online dialogi...
research
08/29/2023

AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have de...

Please sign up or login with your details

Forgot password? Click here to reset