Muppet: Massive Multi-task Representations with Pre-Finetuning

01/26/2021
by   Armen Aghajanyan, et al.
4

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

READ FULL TEXT
05/05/2020

Multi-task pre-training of deep neural networks

In this work, we investigate multi-task learning as a way of pre-trainin...
12/24/2019

Large Scale Learning of General Visual Representations for Transfer

Transfer of pre-trained representations improves sample efficiency and s...
07/20/2020

A Comprehensive Evaluation of Multi-task Learning and Multi-task Pre-training on EHR Time-series Data

Multi-task learning (MTL) is a machine learning technique aiming to impr...
05/05/2020

Multi-task pre-training of deep neural networks for digital pathology

In this work, we investigate multi-task learning as a way of pre-trainin...
02/24/2021

Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data

The paradigm of representation learning through transfer learning has th...
05/24/2022

Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient Multi-task Knowledge Sharing

This work introduces ATTEMPT (Attentional Mixture of Prompt Tuning), a n...
03/28/2022

STaR: Bootstrapping Reasoning With Reasoning

Generating step-by-step "chain-of-thought" rationales improves language ...