When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning

05/17/2022
by   Orion Weller, et al.
0

Transfer learning (TL) in natural language processing (NLP) has seen a surge of interest in recent years, as pre-trained models have shown an impressive ability to transfer to novel tasks. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning: training on an intermediate task before training on the target task (STILTs), using multi-task learning (MTL) to train jointly on a supplementary task and the target task (pairwise MTL), or simply using MTL to train jointly on all available datasets (MTL-ALL). In this work, we compare all three TL methods in a comprehensive analysis on the GLUE dataset suite. We find that there is a simple heuristic for when to use one of these techniques over the other: pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa. We show that this holds true in more than 92 cases on the GLUE dataset and validate this hypothesis with experiments varying dataset size. The simplicity and effectiveness of this heuristic is surprising and warrants additional exploration by the TL community. Furthermore, we find that MTL-ALL is worse than the pairwise methods in almost every case. We hope this study will aid others as they choose between TL methods for NLP tasks.

READ FULL TEXT

page 2

page 9

research
05/29/2020

Massive Choice, Ample Tasks (MaChAmp):A Toolkit for Multi-task Learning in NLP

Transfer learning, particularly approaches that combine multi-task learn...
research
05/01/2020

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Current approaches to solving classification tasks in NLP involve fine-t...
research
04/16/2021

What to Pre-Train on? Efficient Intermediate Task Selection

Intermediate task fine-tuning has been shown to culminate in large trans...
research
05/22/2023

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

Recent work in NLP has shown promising results in training models on lar...
research
09/28/2018

Using Multi-task and Transfer Learning to Solve Working Memory Tasks

We propose a new architecture called Memory-Augmented Encoder-Solver (MA...
research
10/26/2021

How to transfer algorithmic reasoning knowledge to learn new algorithms?

Learning to execute algorithms is a fundamental problem that has been wi...
research
01/28/2020

A Study of Pyramid Structure for Code Correction

We demonstrate the implementations of pyramid encoders in both multi-lay...

Please sign up or login with your details

Forgot password? Click here to reset