Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training

12/16/2020
by   Chen Xing, et al.
4

In this work, we propose to improve the effectiveness of language pre-training methods with the help of mis-predictions during pre-training. Neglecting words in the input sentence that have conflicting semantics with mis-predictions is likely to be the reason of generating mis-predictions at pre-training. Therefore, we hypothesis that mis-predictions during pre-training can act as detectors of the ill focuses of the model. If we train the model to focus more on the conflicts with the mis-predictions while focus less on the rest words in the input sentence, the mis-predictions can be more easily corrected and the entire model could be better trained. Towards this end, we introduce Focusing Less on Context of Mis-predictions(McMisP). In McMisP, we record the co-occurrence information between words to detect the conflicting words with mis-predictions in an unsupervised way. Then McMisP uses such information to guide the attention modules when a mis-prediction occurs. Specifically, several attention modules in the Transformer are optimized to focus more on words in the input sentence that have co-occurred rarely with the mis-predictions and vice versa. Results show that McMisP significantly expedites BERT and ELECTRA and improves their performances on downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2020

Taking Notes on the Fly Helps BERT Pre-training

How to make unsupervised language pre-training more efficient and less r...
research
10/13/2021

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Pre-trained language models (PLMs) aim to learn universal language repre...
research
05/13/2022

Improving Contextual Representation with Gloss Regularized Pre-training

Though achieving impressive results on many NLP tasks, the BERT-like mas...
research
01/20/2023

Ontology Pre-training for Poison Prediction

Integrating human knowledge into neural networks has the potential to im...
research
08/22/2019

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

We introduce a new pre-trainable generic representation for visual-lingu...
research
07/09/2019

A Deep Neural Network for Finger Counting and Numerosity Estimation

In this paper, we present neuro-robotics models with a deep artificial n...
research
10/29/2020

Contextual BERT: Conditioning the Language Model Using a Global State

BERT is a popular language model whose main pre-training task is to fill...

Please sign up or login with your details

Forgot password? Click here to reset