Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

10/26/2020
by   Minjia Zhang, et al.
0

Recently, Transformer-based language models have demonstrated remarkable performance across many NLP domains. However, the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current methods for accelerating the pre-training either rely on massive parallelism with advanced hardware or are not applicable to language modeling. In this work, we propose a method based on progressive layer dropping that speeds the training of Transformer-based language models, not at the cost of excessive hardware resources but from model architecture change and training technique boosted efficiency. Extensive experiments on BERT show that the proposed method achieves a 24 the pre-training to be 2.5 times faster than the baseline to get a similar accuracy on downstream tasks. While being faster, our pre-trained models are equipped with strong knowledge transferability, achieving comparable and sometimes higher GLUE score than the baseline when pre-trained with the same number of samples.

READ FULL TEXT
research
06/22/2021

A Comprehensive Exploration of Pre-training Language Models

Recently, the development of pre-trained language models has brought nat...
research
09/14/2019

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Language models are essential for natural language processing (NLP) task...
research
01/29/2022

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

In recent years, large pre-trained Transformer-based language models hav...
research
07/12/2023

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

The computation necessary for training Transformer-based language models...
research
01/28/2022

Can Wikipedia Help Offline Reinforcement Learning?

Fine-tuning reinforcement learning (RL) models has been challenging beca...
research
11/23/2018

Revisiting Pre-training: An Efficient Training Method for Image Classification

The training method of repetitively feeding all samples into a pre-defin...
research
08/13/2021

Towards Structured Dynamic Sparse Pre-Training of BERT

Identifying algorithms for computational efficient unsupervised training...

Please sign up or login with your details

Forgot password? Click here to reset