Taking Notes on the Fly Helps BERT Pre-training

08/04/2020
by   Qiyu Wu, et al.
0

How to make unsupervised language pre-training more efficient and less resource-intensive is an important research direction in NLP. In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization. It is well-known that in language data corpus, words follow a heavy-tail distribution. A large proportion of words appear only very few times and the embeddings of rare words are usually poorly optimized. We argue that such embeddings carry inadequate semantic signals. They could make the data utilization inefficient and slow down the pre-training of the entire model. To solve this problem, we propose Taking Notes on the Fly (TNF). TNF takes notes for rare words on the fly during pre-training to help the model understand them when they occur next time. Specifically, TNF maintains a note dictionary and saves a rare word's context information in it as notes when the rare word occurs in a sentence. When the same rare word occurs again in training, TNF employs the note information saved beforehand to enhance the semantics of the current sentence. By doing so, TNF provides a better data utilization since cross-sentence information is employed to cover the inadequate semantics caused by rare words in the sentences. Experimental results show that TNF significantly expedite the BERT pre-training and improve the model's performance on downstream tasks. TNF's training time is 60% less than BERT when reaching the same performance. When trained with same number of iterations, TNF significantly outperforms BERT on most of downstream tasks and the average GLUE score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2021

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Pre-trained language models (PLMs) aim to learn universal language repre...
research
12/16/2020

Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training

In this work, we propose to improve the effectiveness of language pre-tr...
research
05/13/2022

Improving Contextual Representation with Gloss Regularized Pre-training

Though achieving impressive results on many NLP tasks, the BERT-like mas...
research
06/01/2017

Learning to Compute Word Embeddings On the Fly

Words in natural language follow a Zipfian distribution whereby some wor...
research
03/29/2021

Whitening Sentence Representations for Better Semantics and Faster Retrieval

Pre-training models such as BERT have achieved great success in many nat...
research
06/21/2023

SIFTER: A Task-specific Alignment Strategy for Enhancing Sentence Embeddings

The paradigm of pre-training followed by fine-tuning on downstream tasks...
research
06/10/2016

Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

Unsupervised text embeddings extraction is crucial for text understandin...

Please sign up or login with your details

Forgot password? Click here to reset