DeepAI AI Chat
Log In Sign Up

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

by   Hongqiu Wu, et al.
Shanghai Jiao Tong University
Alibaba Group
Soochow University

Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling, which serves the ultimate purpose of pre-trained language models (PrLMs), generalizing well on a mass of scenarios. However, learning multiple training objectives in a single model is challenging due to the unknown relative significance as well as the potential contrariety between them. Empirical studies have shown that the current objective sampling in an ad-hoc manual setting makes the learned language representation barely converge to the desired optimum. Thus, we propose MOMETAS, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives. Such a design is lightweight with negligible additional training overhead. To validate our approach, we adopt five objectives and conduct continual pre-training with BERT-base and BERT-large models, where MOMETAS demonstrates universal performance gain over other rule-based sampling strategies on 14 natural language processing tasks.


page 1

page 2

page 3

page 4


How does the pre-training objective affect what large language models learn about linguistic properties?

Several pre-training objectives, such as masked language modeling (MLM),...

Does the Objective Matter? Comparing Training Objectives for Pronoun Resolution

Hard cases of pronoun resolution have been used as a long-standing bench...

Maximizing Efficiency of Language Model Pre-training for Learning Representation

Pre-trained language models in the past years have shown exponential gro...

Benchmarking down-scaled (not so large) pre-trained language models

Large Transformer-based language models are pre-trained on corpora of va...

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Pre-trained contextual representations (e.g., BERT) have become the foun...

Trained on 100 million words and still in shape: BERT meets British National Corpus

While modern masked language models (LMs) are trained on ever larger cor...

SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

In this paper, we propose SPBERT, a transformer-based language model pre...