Continual Pre-training of Language Models

02/07/2023
by   Zixuan Ke, et al.
0

Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual pre-training of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2023

Adapting a Language Model While Preserving its General Knowledge

Domain-adaptive pre-training (or DA-training for short), also known as p...
research
10/11/2022

Continual Training of Language Models for Few-Shot Learning

Recent work on applying large language models (LMs) achieves impressive ...
research
11/01/2022

VarMAE: Pre-training of Variational Masked Autoencoder for Domain-adaptive Language Understanding

Pre-trained language models have achieved promising performance on gener...
research
05/15/2023

Recyclable Tuning for Continual Pre-training

Continual pre-training is the paradigm where pre-trained language models...
research
10/20/2022

Tele-Knowledge Pre-training for Fault Analysis

In this work, we share our experience on tele-knowledge pre-training for...
research
10/19/2022

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

Multiple pre-training objectives fill the vacancy of the understanding c...
research
06/07/2023

ModuleFormer: Learning Modular Large Language Models From Uncurated Data

Large Language Models (LLMs) have achieved remarkable results. But exist...

Please sign up or login with your details

Forgot password? Click here to reset