Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

10/19/2022
by   Hongqiu Wu, et al.
0

Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling, which serves the ultimate purpose of pre-trained language models (PrLMs), generalizing well on a mass of scenarios. However, learning multiple training objectives in a single model is challenging due to the unknown relative significance as well as the potential contrariety between them. Empirical studies have shown that the current objective sampling in an ad-hoc manual setting makes the learned language representation barely converge to the desired optimum. Thus, we propose MOMETAS, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives. Such a design is lightweight with negligible additional training overhead. To validate our approach, we adopt five objectives and conduct continual pre-training with BERT-base and BERT-large models, where MOMETAS demonstrates universal performance gain over other rule-based sampling strategies on 14 natural language processing tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2022

How does the pre-training objective affect what large language models learn about linguistic properties?

Several pre-training objectives, such as masked language modeling (MLM),...
research
10/06/2020

Does the Objective Matter? Comparing Training Objectives for Pronoun Resolution

Hard cases of pronoun resolution have been used as a long-standing bench...
research
02/24/2023

MUX-PLMs: Pre-training Language Models with Data Multiplexing

Data multiplexing is a recently proposed method for improving a model's ...
research
02/07/2023

Continual Pre-training of Language Models

Language models (LMs) have been instrumental for the rapid advance of na...
research
10/13/2021

Maximizing Efficiency of Language Model Pre-training for Learning Representation

Pre-trained language models in the past years have shown exponential gro...
research
09/14/2022

Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?

Recent years have witnessed great progress on applying pre-trained langu...
research
06/18/2021

SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

In this paper, we propose SPBERT, a transformer-based language model pre...

Please sign up or login with your details

Forgot password? Click here to reset