NarrowBERT: Accelerating Masked Language Model Pretraining and Inference

01/11/2023
by   Haoxin Li, et al.
0

Large-scale language model pretraining is a very successful form of self-supervised learning in natural language processing, but it is increasingly expensive to perform as the models and pretraining corpora have become larger over time. We propose NarrowBERT, a modified transformer encoder that increases the throughput for masked language model pretraining by more than 2×. NarrowBERT sparsifies the transformer model such that the self-attention queries and feedforward layers only operate on the masked tokens of each sentence during pretraining, rather than all of the tokens as with the usual transformer encoder. We also show that NarrowBERT increases the throughput at inference time by as much as 3.5× with minimal (or no) performance degradation on sentence encoding tasks like MNLI. Finally, we examine the performance of NarrowBERT on the IMDB and Amazon reviews classification and CoNLL NER tasks and show that it is also comparable to standard BERT performance.

READ FULL TEXT
research
05/16/2020

CERT: Contrastive Self-supervised Learning for Language Understanding

Pretrained language models such as BERT, GPT have shown great effectiven...
research
03/19/2019

Cloze-driven Pretraining of Self-attention Networks

We present a new approach for pretraining a bi-directional transformer m...
research
05/30/2022

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

Sequence-to-sequence (seq2seq) learning has become a popular trend for p...
research
08/12/2020

Variance-reduced Language Pretraining via a Mask Proposal Network

Self-supervised learning, a.k.a., pretraining, is important in natural l...
research
08/09/2023

Optimizing a Transformer-based network for a deep learning seismic processing workflow

StorSeismic is a recently introduced model based on the Transformer to a...
research
04/18/2021

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset

While self-supervised learning has made rapid advances in natural langua...
research
09/26/2019

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations ...

Please sign up or login with your details

Forgot password? Click here to reset