Improving robustness of language models from a geometry-aware perspective

04/28/2022
by   Bin Zhu, et al.
0

Recent studies have found that removing the norm-bounded projection and increasing search steps in adversarial training can significantly improve robustness. However, we observe that a too large number of search steps can hurt accuracy. We aim to obtain strong robustness efficiently using fewer steps. Through a toy experiment, we find that perturbing the clean data to the decision boundary but not crossing it does not degrade the test accuracy. Inspired by this, we propose friendly adversarial data augmentation (FADA) to generate friendly adversarial data. On top of FADA, we propose geometry-aware adversarial training (GAT) to perform adversarial training on friendly adversarial data so that we can save a large number of search steps. Comprehensive experiments across two widely used datasets and three pre-trained language models demonstrate that GAT can obtain stronger robustness via fewer steps. In addition, we provide extensive empirical results and in-depth analyses on robustness to facilitate future studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2020

Geometry-aware Instance-reweighted Adversarial Training

In adversarial machine learning, there was a common belief that robustne...
research
11/10/2022

Impact of Adversarial Training on Robustness and Generalizability of Language Models

Adversarial training is widely acknowledged as the most effective defens...
research
02/26/2020

Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Adversarial training based on the minimax formulation is necessary for o...
research
10/05/2020

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Large-scale language models such as BERT have achieved state-of-the-art ...
research
11/14/2022

Efficient Adversarial Training with Robust Early-Bird Tickets

Adversarial training is one of the most powerful methods to improve the ...
research
10/06/2020

Constraining Logits by Bounded Function for Adversarial Robustness

We propose a method for improving adversarial robustness by addition of ...
research
02/06/2021

Understanding the Interaction of Adversarial Training with Noisy Labels

Noisy labels (NL) and adversarial examples both undermine trained models...

Please sign up or login with your details

Forgot password? Click here to reset