Impact of Adversarial Training on Robustness and Generalizability of Language Models

11/10/2022
by   Enes Altinisik, et al.
0

Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of BERT-like language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveal that the improved generalization is due to `more specialized' neurons. To the best of our knowledge, this is the first work to carry out a deep qualitative analysis of different methods of generating adversarial examples in adversarial training of language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2020

Adversarial Training for Large Neural Language Models

Generalization and robustness are both key desiderata for designing mach...
research
05/26/2021

Deep Repulsive Prototypes for Adversarial Robustness

While many defences against adversarial examples have been proposed, fin...
research
04/17/2018

Robust Machine Comprehension Models via Adversarial Training

It is shown that many published models for the Stanford Question Answeri...
research
04/28/2022

Improving robustness of language models from a geometry-aware perspective

Recent studies have found that removing the norm-bounded projection and ...
research
04/03/2021

Property-driven Training: All You (N)Ever Wanted to Know About

Neural networks are known for their ability to detect general patterns i...
research
04/11/2021

Achieving Model Robustness through Discrete Adversarial Training

Discrete adversarial attacks are symbolic perturbations to a language in...
research
11/07/2022

Using Deep Mixture-of-Experts to Detect Word Meaning Shift for TempoWiC

This paper mainly describes the dma submission to the TempoWiC task, whi...

Please sign up or login with your details

Forgot password? Click here to reset