Transfer training from smaller language model

04/23/2021
by   Han Zhang, et al.
0

Large language models have led to state-of-the-art accuracies across a range of tasks. However,training large language model needs massive computing resource, as more and more open source pre-training models are available, it is worthy to study how to take full advantage of available model. We find a method to save training time and resource cost by changing the small well-trained model to large model. We initialize a larger target model from a smaller source model by copy weight values from source model and padding with zeros or small initialization values on it to make the source and target model have approximate outputs, which is valid due to block matrix multiplication and residual connection in transformer structure. We test the target model on several data sets and find it is still comparable with the source model. When we continue training the target model, the training loss can start from a smaller value.

READ FULL TEXT
research
01/23/2023

Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

Most Transformer language models are primarily pretrained on English tex...
research
08/04/2018

Language Model Supervision for Handwriting Recognition Model Adaptation

Training state-of-the-art offline handwriting recognition (HWR) models r...
research
07/14/2022

BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

The pre-training of large language models usually requires massive amoun...
research
10/24/2020

Open-Domain Dialogue Generation Based on Pre-trained Language Models

Pre-trained language models have been successfully used in response gene...
research
08/12/2022

LM-CORE: Language Models with Contextually Relevant External Knowledge

Large transformer-based pre-trained language models have achieved impres...
research
07/31/2020

Improving NER's Performance with Massive financial corpus

Training large deep neural networks needs massive high quality annotatio...
research
06/18/2021

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

Language models can generate harmful and biased outputs and exhibit unde...

Please sign up or login with your details

Forgot password? Click here to reset