MUX-PLMs: Pre-training Language Models with Data Multiplexing

02/24/2023
by   Vishvak Murahari, et al.
0

Data multiplexing is a recently proposed method for improving a model's inference efficiency by processing multiple instances simultaneously using an ordered representation mixture. Prior work on data multiplexing only used task-specific Transformers without any pre-training, which limited their accuracy and generality. In this paper, we develop pre-trained multiplexed language models (MUX-PLMs) that can be widely finetuned on any downstream task. Our approach includes a three-stage training procedure and novel multiplexing and demultiplexing modules for improving throughput and downstream task accuracy. We demonstrate our method on BERT and ELECTRA pre-training objectives, with our MUX-BERT and MUX-ELECTRA models achieving 2x/5x inference speedup with a 2-4 % drop in absolute performance on GLUE and 1-2 % drop on token-level tasks.

READ FULL TEXT
research
09/10/2020

Task-specific Objectives of Pre-trained Language Models for Dialogue Adaptation

Pre-trained Language Models (PrLMs) have been widely used as backbones i...
research
10/13/2021

Maximizing Efficiency of Language Model Pre-training for Learning Representation

Pre-trained language models in the past years have shown exponential gro...
research
12/15/2020

Pre-Training Transformers as Energy-Based Cloze Models

We introduce Electric, an energy-based cloze model for representation le...
research
06/07/2023

ModuleFormer: Learning Modular Large Language Models From Uncurated Data

Large Language Models (LLMs) have achieved remarkable results. But exist...
research
10/19/2022

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

Multiple pre-training objectives fill the vacancy of the understanding c...
research
10/24/2022

Effective Pre-Training Objectives for Transformer-based Autoencoders

In this paper, we study trade-offs between efficiency, cost and accuracy...
research
09/14/2021

Frequency Effects on Syntactic Rule Learning in Transformers

Pre-trained language models perform well on a variety of linguistic task...

Please sign up or login with your details

Forgot password? Click here to reset