Muppet: Massive Multi-task Representations with Pre-Finetuning

01/26/2021
by   Armen Aghajanyan, et al.
4

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

READ FULL TEXT
research
02/21/2023

Device Tuning for Multi-Task Large Model

Unsupervised pre-training approaches have achieved great success in many...
research
12/24/2019

Large Scale Learning of General Visual Representations for Transfer

Transfer of pre-trained representations improves sample efficiency and s...
research
01/29/2023

Unifying Molecular and Textual Representations via Multi-task Language Modelling

The recent advances in neural language models have also been successfull...
research
05/05/2020

Multi-task pre-training of deep neural networks for digital pathology

In this work, we investigate multi-task learning as a way of pre-trainin...
research
07/20/2020

A Comprehensive Evaluation of Multi-task Learning and Multi-task Pre-training on EHR Time-series Data

Multi-task learning (MTL) is a machine learning technique aiming to impr...
research
05/05/2020

Multi-task pre-training of deep neural networks

In this work, we investigate multi-task learning as a way of pre-trainin...
research
02/24/2021

Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data

The paradigm of representation learning through transfer learning has th...

Please sign up or login with your details

Forgot password? Click here to reset