Self-Supervised Contrastive Learning with Adversarial Perturbations for Robust Pretrained Language Models

07/15/2021
by   Zhao Meng, et al.
0

This paper improves the robustness of the pretrained language model BERT against word substitution-based adversarial attacks by leveraging self-supervised contrastive learning with adversarial perturbations. One advantage of our method compared to previous works is that it is capable of improving model robustness without using any labels. Additionally, we also create an adversarial attack for word-level adversarial training on BERT. The attack is efficient, allowing adversarial training for BERT on adversarial examples generated on the fly during training. Experimental results on four datasets show that our method improves the robustness of BERT against four different word substitution-based adversarial attacks. Furthermore, to understand why our method can improve the model robustness against adversarial attacks, we study vector representations of clean examples and their corresponding adversarial examples before and after applying our method. As our method improves model robustness with unlabeled raw data, it opens up the possibility of using large text datasets to train robust language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2023

SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification

Despite their promising performance across various natural language proc...
research
03/16/2022

Robustness through Cognitive Dissociation Mitigation in Contrastive Adversarial Training

In this paper, we introduce a novel neural network training framework th...
research
04/11/2021

Achieving Model Robustness through Discrete Adversarial Training

Discrete adversarial attacks are symbolic perturbations to a language in...
research
06/01/2022

Order-sensitive Shapley Values for Evaluating Conceptual Soundness of NLP Models

Previous works show that deep NLP models are not always conceptually sou...
research
03/23/2022

Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation

Neural language models show vulnerability to adversarial examples which ...
research
08/30/2021

Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings

Adversarial robustness of deep models is pivotal in ensuring safe deploy...
research
02/06/2023

Less is More: Understanding Word-level Textual Adversarial Attack via n-gram Frequency Descend

Word-level textual adversarial attacks have achieved striking performanc...

Please sign up or login with your details

Forgot password? Click here to reset