Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

11/02/2018
by   Jason Phang, et al.
0

Pretraining with language modeling and related unsupervised tasks has recently been shown to be a very effective enabling technology for the development of neural network models for language understanding tasks. In this work, we show that although language model-style pretraining is extremely effective at teaching models about language, it does not yield an ideal starting point for efficient transfer learning. By supplementing language model-style pretraining with further training on data-rich supervised tasks, we are able to achieve substantial additional performance improvements across the nine target tasks in the GLUE benchmark. We obtain an overall score of 76.9 on GLUE--a 2.3 point improvement over our baseline system adapted from Radford et al. (2018) and a 4.1 point improvement over Radford et al.'s reported score. We further use training data downsampling to show that the benefits of this supplementary training are even more pronounced in data-constrained regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2018

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling

Work on the problem of contextualized word representation -- the develop...
research
09/26/2018

Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis

Recent work using auxiliary prediction task classifiers to investigate t...
research
02/27/2023

Linear pretraining in recurrent mixture density networks

We present a method for pretraining a recurrent mixture density network ...
research
12/09/2021

MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Large-scale pretraining is fast becoming the norm in Vision-Language (VL...
research
08/30/2023

ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding

We present ToddlerBERTa, a BabyBERTa-like language model, exploring its ...
research
06/15/2020

To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

Pretraining NLP models with variants of Masked Language Model (MLM) obje...
research
04/24/2020

Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Textual entailment (or NLI) data has proven useful as pretraining data f...

Please sign up or login with your details

Forgot password? Click here to reset