Maximizing Efficiency of Language Model Pre-training for Learning Representation

10/13/2021
by   Junmo Kang, et al.
0

Pre-trained language models in the past years have shown exponential growth in model parameters and compute time. ELECTRA is a novel approach for improving the compute efficiency of pre-trained language models (e.g. BERT) based on masked language modeling (MLM) by addressing the sample inefficiency problem with the replaced token detection (RTD) task. Our work proposes adaptive early exit strategy to maximize the efficiency of the pre-training process by relieving the model's subsequent layers of the need to process latent features by leveraging earlier layer representations. Moreover, we evaluate an initial approach to the problem that has not succeeded in maintaining the accuracy of the model while showing a promising compute efficiency by thoroughly investigating the necessity of the generator module of ELECTRA.

READ FULL TEXT
research
02/24/2023

MUX-PLMs: Pre-training Language Models with Data Multiplexing

Data multiplexing is a recently proposed method for improving a model's ...
research
10/27/2022

Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval

Pre-trained language model (PTM) has been shown to yield powerful text r...
research
06/10/2020

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Pre-trained contextual representations (e.g., BERT) have become the foun...
research
06/21/2023

Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks

Investigating deep learning language models has always been a significan...
research
11/18/2021

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

This paper presents a new pre-trained language model, DeBERTaV3, which i...
research
10/19/2022

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

Multiple pre-training objectives fill the vacancy of the understanding c...
research
08/23/2022

Learning Better Masking for Better Language Model Pre-training

Masked Language Modeling (MLM) has been widely used as the denoising obj...

Please sign up or login with your details

Forgot password? Click here to reset