Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets

07/20/2022
by   Yi Yang, et al.
2

Over-parameterized models, typically pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2023

SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domains

Prompting pre-trained language models leads to promising results across ...
research
04/28/2020

DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis

This paper focuses on learning domain-oriented language models driven by...
research
02/08/2022

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Pre-trained language models (LMs) are shown to easily generate toxic lan...
research
02/10/2021

Customizing Contextualized Language Models forLegal Document Reviews

Inspired by the inductive transfer learning on computer vision, many eff...
research
09/11/2023

An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

Nowadays, the versatile capabilities of Pre-trained Large Language Model...
research
10/08/2022

Sparse Teachers Can Be Dense with Knowledge

Recent advances in distilling pretrained language models have discovered...
research
03/25/2021

Boosting Binary Masks for Multi-Domain Learning through Affine Transformations

In this work, we present a new, algorithm for multi-domain learning. Giv...

Please sign up or login with your details

Forgot password? Click here to reset