Knowledge Inheritance for Pre-trained Language Models

05/28/2021
by   Yujia Qin, et al.
8

Recent explorations of large-scale pre-trained language models (PLMs) such as GPT-3 have revealed the power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger PLMs. However, training a large-scale PLM requires tremendous amounts of computational resources, which is time-consuming and expensive. In addition, existing large-scale PLMs are mainly trained from scratch individually, ignoring the availability of many existing well-trained PLMs. To this end, we explore the question that how can previously trained PLMs benefit training larger PLMs in future. Specifically, we introduce a novel pre-training framework named "knowledge inheritance" (KI), which combines both self-learning and teacher-guided learning to efficiently train larger PLMs. Sufficient experimental results demonstrate the feasibility of our KI framework. We also conduct empirical analyses to explore the effects of teacher PLMs' pre-training settings, including model architecture, pre-training data, etc. Finally, we show that KI can well support lifelong learning and knowledge transfer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2022

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Pre-trained language models (LMs) are shown to easily generate toxic lan...
research
12/23/2021

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Pre-trained language models have achieved state-of-the-art results in va...
research
01/03/2020

On the comparability of Pre-trained Language Models

Recent developments in unsupervised representation learning have success...
research
05/02/2023

Transfer Visual Prompt Generator across LLMs

While developing a new vision-language LLM (VL-LLM) by pre-training on t...
research
06/14/2021

Pre-Trained Models: Past, Present and Future

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently...
research
03/06/2023

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

Recent years have witnessed the unprecedented achievements of large-scal...
research
12/14/2021

Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models

As many fine-tuned pre-trained language models (PLMs) with promising per...

Please sign up or login with your details

Forgot password? Click here to reset