What to Pre-Train on? Efficient Intermediate Task Selection

04/16/2021
by   Clifton Poth, et al.
7

Intermediate task fine-tuning has been shown to culminate in large transfer gains across many NLP tasks. With an abundance of candidate datasets as well as pre-trained language models, it has become infeasible to run the cross-product of all combinations to find the best transfer setting. In this work we first establish that similar sequential fine-tuning gains can be achieved in adapter settings, and subsequently consolidate previously proposed methods that efficiently identify beneficial tasks for intermediate transfer learning. We experiment with a diverse set of 42 intermediate and 11 target English classification, multiple choice, question answering, and sequence tagging tasks. Our results show that efficient embedding based methods that rely solely on the respective datasets outperform computational expensive few-shot fine-tuning approaches. Our best methods achieve an average Regret@3 of less than 1 identify the best datasets for intermediate training.

READ FULL TEXT
research
12/30/2021

Does QA-based intermediate training help fine-tuning language models for text classification?

Fine-tuning pre-trained language models for downstream tasks has become ...
research
05/17/2022

When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning

Transfer learning (TL) in natural language processing (NLP) has seen a s...
research
05/01/2020

Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?

While pretrained models such as BERT have shown large gains across natur...
research
10/21/2022

Efficiently Tuned Parameters are Task Embeddings

Intermediate-task transfer can benefit a wide range of NLP tasks with pr...
research
02/11/2023

Divergence-Based Domain Transferability for Zero-Shot Classification

Transferring learned patterns from pretrained neural language models has...
research
10/23/2022

Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning

Prompt tuning approaches, which learn task-specific soft prompts for a d...
research
12/31/2020

FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation

Natural language (NL) explanations of model predictions are gaining popu...

Please sign up or login with your details

Forgot password? Click here to reset