Log In Sign Up

Predictions For Pre-training Language Models

by   Tong Guo, et al.

Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the specific task's loss in pre-training step. In industry NLP applications, we have large amount of data produced by users. We use the fine-tuned model to give the user-generated unlabeled data a pseudo-label. Then we use the pseudo-label for the task-specific loss and masked language model loss to pre-train. The experiment shows that using the fine-tuned model's predictions for pseudo-labeled pre-training offers further gains in the downstream task. The improvement of our method is stable and remarkable.


page 1

page 2

page 3

page 4


Pre-Training a Language Model Without Human Language

In this paper, we study how the intrinsic nature of pre-training data co...

Pseudo-OOD training for robust language models

While pre-trained large-scale deep models have garnered attention as an ...

Multi-armed bandits for online optimization of language model pre-training: the use case of dynamic masking

Transformer-based language models (TLMs) provide state-of-the-art perfor...

Using Pre-Training Can Improve Model Robustness and Uncertainty

Tuning a pre-trained network is commonly thought to improve data efficie...

Revisiting Self-Training for Few-Shot Learning of Language Model

As unlabeled data carry rich task-relevant information, they are proven ...