Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models

09/13/2021
by   Kun Zhou, et al.
0

Recent works have shown that powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. To solve this issue, various data augmentation techniques are proposed to improve the robustness of PLMs. However, it is still challenging to augment semantically relevant examples with sufficient diversity. In this work, we present Virtual Data Augmentation (VDA), a general framework for robustly fine-tuning PLMs. Based on the original token embeddings, we construct a multinomial mixture for augmenting virtual data embeddings, where a masked language model guarantees the semantic relevance and the Gaussian noise provides the augmentation diversity. Furthermore, a regularized training strategy is proposed to balance the two aspects. Extensive experiments on six datasets show that our approach is able to improve the robustness of PLMs and alleviate the performance degradation under adversarial attacks. Our codes and data are publicly available at <https://github.com/RUCAIBox/VDA>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2022

ROSE: Robust Selective Fine-tuning for Pre-trained Language Models

Even though the large-scale language models have achieved excellent perf...
research
07/26/2018

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

This document describes Pythia v0.1, the winning entry from Facebook AI ...
research
10/28/2022

RoChBert: Towards Robust BERT Fine-tuning for Chinese

Despite of the superb performance on a wide range of tasks, pre-trained ...
research
05/03/2022

Embedding Hallucination for Few-Shot Language Fine-tuning

Few-shot language learners adapt knowledge from a pre-trained model to r...
research
02/04/2023

Semantic-Guided Image Augmentation with Pre-trained Models

Image augmentation is a common mechanism to alleviate data scarcity in c...
research
10/26/2021

AugMax: Adversarial Composition of Random Augmentations for Robust Training

Data augmentation is a simple yet effective way to improve the robustnes...
research
10/07/2022

UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation

This paper presents our strategy to address the SemEval-2022 Task 3 PreT...

Please sign up or login with your details

Forgot password? Click here to reset