DEMix Layers: Disentangling Domains for Modular Language Modeling

08/11/2021
by   Suchin Gururangan, et al.
0

We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMix layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular: experts can be mixed, added or removed after initial training. Extensive experiments with autoregressive transformer LMs (up to 1.3B parameters) show that DEMix layers reduce test-time perplexity, increase training efficiency, and enable rapid adaptation with little overhead. We show that mixing experts during inference, using a parameter-free weighted ensemble, allows the model to better generalize to heterogeneous or unseen domains. We also show that experts can be added to iteratively incorporate new domains without forgetting older ones, and that experts can be removed to restrict access to unwanted domains, without additional training. Overall, these results demonstrate benefits of explicitly conditioning on textual domains during language modeling.

READ FULL TEXT

page 2

page 7

page 8

research
12/16/2013

Learning Factored Representations in a Deep Mixture of Experts

Mixtures of Experts combine the outputs of several "expert" networks, ea...
research
08/05/2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

We present Branch-Train-Merge (BTM), a communication-efficient algorithm...
research
12/20/2021

Efficient Large Scale Language Modeling with Mixtures of Experts

Mixture of Experts layers (MoEs) enable efficient scaling of language mo...
research
02/14/2023

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

Pretrained language models (PLMs) are trained on massive corpora, but of...
research
11/18/2016

Expert Gate: Lifelong Learning with a Network of Experts

In this paper we introduce a model of lifelong learning, based on a Netw...
research
07/10/2019

Large Memory Layers with Product Keys

This paper introduces a structured memory which can be easily integrated...
research
04/22/2022

Balancing Expert Utilization in Mixture-of-Experts Layers Embedded in CNNs

This work addresses the problem of unbalanced expert utilization in spar...

Please sign up or login with your details

Forgot password? Click here to reset