Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

09/25/2019
by   Cheolhyoung Lee, et al.
95

In natural language processing, it has been observed recently that generalization could be greatly improved by finetuning a large-scale language model pretrained on a large unlabeled corpus. Despite its recent success and wide adoption, finetuning a large pretrained language model on a downstream task is prone to degenerate performance when there are only a small number of training instances available. In this paper, we introduce a new regularization technique, to which we refer as "mixout", motivated by dropout. Mixout stochastically mixes the parameters of two models. We show that our mixout technique regularizes learning to minimize the deviation from one of the two models and that the strength of regularization adapts along the optimization trajectory. We empirically evaluate the proposed mixout and its variants on finetuning a pretrained language model on downstream tasks. More specifically, we demonstrate that the stability of finetuning and the average accuracy greatly increase when we use the proposed approach to regularize finetuning of BERT on downstream tasks in GLUE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2023

GreekBART: The First Pretrained Greek Sequence-to-Sequence Model

The era of transfer learning has revolutionized the fields of Computer V...
research
11/10/2019

CamemBERT: a Tasty French Language Model

Pretrained language models are now ubiquitous in Natural Language Proces...
research
04/19/2023

NetGPT: Generative Pretrained Transformer for Network Traffic

All data on the Internet are transferred by network traffic, thus accura...
research
09/18/2020

The birth of Romanian BERT

Large-scale pretrained language models have become ubiquitous in Natural...
research
03/29/2021

Retraining DistilBERT for a Voice Shopping Assistant by Using Universal Dependencies

In this work, we retrained the distilled BERT language model for Walmart...
research
04/27/2022

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

The UMLS Metathesaurus integrates more than 200 biomedical source vocabu...
research
07/17/2022

Natural language processing for clusterization of genes according to their functions

There are hundreds of methods for analysis of data obtained in mRNA-sequ...

Please sign up or login with your details

Forgot password? Click here to reset