On Effectively Learning of Knowledge in Continual Pre-training

04/17/2022
by   Cunxiang Wang, et al.
0

Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks. However, by asking models to do cloze-style tests, recent work finds that PLMs are short in acquiring knowledge from unstructured text. To understand the internal behaviour of PLMs in retrieving knowledge, we first define knowledge-baring (K-B) tokens and knowledge-free (K-F) tokens for unstructured text and ask professional annotators to label some samples manually. Then, we find that PLMs are more likely to give wrong predictions on K-B tokens and attend less attention to those tokens inside the self-attention module. Based on these observations, we develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner. Experiments on knowledge-intensive tasks show the effectiveness of the proposed methods. To our best knowledge, we are the first to explore fully self-supervised learning of knowledge in continual pre-training.

READ FULL TEXT
research
06/10/2021

MST: Masked Self-Supervised Transformer for Visual Representation

Transformer has been widely used for self-supervised pre-training in Nat...
research
08/08/2023

Continual Pre-Training of Large Language Models: How to (re)warm your model?

Large language models (LLMs) are routinely pre-trained on billions of to...
research
08/07/2021

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Motivated by the success of masked language modeling (MLM) in pre-traini...
research
02/02/2022

Relative Position Prediction as Pre-training for Text Encoders

Meaning is defined by the company it keeps. However, company is two-fold...
research
10/12/2020

Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model

Masked Language Model (MLM) framework has been widely adopted for self-s...
research
12/15/2020

Pre-Training Transformers as Energy-Based Cloze Models

We introduce Electric, an energy-based cloze model for representation le...
research
02/08/2022

How to Understand Masked Autoencoders

"Masked Autoencoders (MAE) Are Scalable Vision Learners" revolutionizes ...

Please sign up or login with your details

Forgot password? Click here to reset