DeepAI AI Chat
Log In Sign Up

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

11/02/2018
by   Jason Phang, et al.
NYU college
0

Pretraining with language modeling and related unsupervised tasks has recently been shown to be a very effective enabling technology for the development of neural network models for language understanding tasks. In this work, we show that although language model-style pretraining is extremely effective at teaching models about language, it does not yield an ideal starting point for efficient transfer learning. By supplementing language model-style pretraining with further training on data-rich supervised tasks, we are able to achieve substantial additional performance improvements across the nine target tasks in the GLUE benchmark. We obtain an overall score of 76.9 on GLUE--a 2.3 point improvement over our baseline system adapted from Radford et al. (2018) and a 4.1 point improvement over Radford et al.'s reported score. We further use training data downsampling to show that the benefits of this supplementary training are even more pronounced in data-constrained regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/28/2018

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling

Work on the problem of contextualized word representation -- the develop...
09/26/2018

Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis

Recent work using auxiliary prediction task classifiers to investigate t...
02/27/2023

Linear pretraining in recurrent mixture density networks

We present a method for pretraining a recurrent mixture density network ...
05/24/2019

Human vs. Muppet: A Conservative Estimate of HumanPerformance on the GLUE Benchmark

The GLUE benchmark (Wang et al., 2019b) is a suite of language understan...
06/15/2020

To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

Pretraining NLP models with variants of Masked Language Model (MLM) obje...
05/24/2023

Emergent inabilities? Inverse scaling over the course of pretraining

Does inverse scaling only occur as a function of model parameter size, o...
04/24/2020

Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Textual entailment (or NLI) data has proven useful as pretraining data f...