Efficient Hierarchical Domain Adaptation for Pretrained Language Models

12/16/2021
by   Alexandra Chronopoulou, et al.
0

Generative language models are trained on diverse, general domain corpora. However, this limits their applicability to narrower domains, and prior work has shown that continued in-domain training can provide further gains. In this paper, we introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach. Our method is based on the observation that textual domains are partially overlapping, and we represent domains as a hierarchical tree structure where each node in the tree is associated with a set of adapter weights. When combined with a frozen pretrained language model, this approach enables parameter sharing among related domains, while avoiding negative interference between unrelated ones. It is efficient and computational cost scales as O(log(D)) for D domains. Experimental results with GPT-2 and a large fraction of the 100 most represented websites in C4 show across-the-board improvements in-domain. We additionally provide an inference time algorithm for a held-out domain and show that averaging over multiple paths through the tree enables further gains in generalization, while adding only a marginal cost to inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2023

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

Pretrained language models (PLMs) are trained on massive corpora, but of...
research
06/12/2019

Tackling Partial Domain Adaptation with Self-Supervision

Domain adaptation approaches have shown promising results in reducing th...
research
04/14/2021

UDALM: Unsupervised Domain Adaptation through Language Modeling

In this work we explore Unsupervised Domain Adaptation (UDA) of pretrain...
research
04/22/2021

Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network

Adaption of end-to-end speech recognition systems to new tasks is known ...
research
06/26/2023

Composing Parameter-Efficient Modules with Arithmetic Operations

As an efficient alternative to conventional full finetuning, parameter-e...
research
11/06/2022

On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey

Recent advances in NLP are brought by a range of large-scale pretrained ...
research
04/05/2020

Unsupervised Domain Clusters in Pretrained Language Models

The notion of "in-domain data" in NLP is often over-simplistic and vague...

Please sign up or login with your details

Forgot password? Click here to reset