DeepAI AI Chat
Log In Sign Up

To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

by   Sinong Wang, et al.

Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks. This paper examines the benefits of pretrained models as a function of the number of training samples used in the downstream task. On several text classification tasks, we show that as the number of training examples grow into the millions, the accuracy gap between finetuning BERT-based model and training vanilla LSTM from scratch narrows to within 1 might reach a diminishing return point as the supervised data size increases significantly.


page 1

page 2

page 3

page 4


Back-Translated Task Adaptive Pretraining: Improving Accuracy and Robustness on Text Classification

Language models (LMs) pretrained on a large text corpus and fine-tuned o...

Audiovisual Masked Autoencoders

Can we leverage the audiovisual information already present in video to ...

Aligning the Pretraining and Finetuning Objectives of Language Models

We demonstrate that explicitly aligning the pretraining objectives to th...

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Pretraining with language modeling and related unsupervised tasks has re...

Domain-Adaptive Pretraining Methods for Dialogue Understanding

Language models like BERT and SpanBERT pretrained on open-domain data ha...

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Language model pretraining has led to significant performance gains but ...

Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Textual entailment (or NLI) data has proven useful as pretraining data f...