Rethinking Why Intermediate-Task Fine-Tuning Works

08/26/2021
by   Ting-Yun Chang, et al.
0

Supplementary Training on Intermediate Labeled-data Tasks (STILTs) is a widely applied technique, which first fine-tunes the pretrained language models on an intermediate task before on the target task of interest. While STILTs is able to further improve the performance of pretrained language models, it is still unclear why and when it works. Previous research shows that those intermediate tasks involving complex inference, such as commonsense reasoning, work especially well for RoBERTa. In this paper, we discover that the improvement from an intermediate task could be orthogonal to it containing reasoning or other complex skills – a simple real-fake discrimination task synthesized by GPT2 can benefit diverse target tasks. We conduct extensive experiments to study the impact of different factors on STILTs. These findings suggest rethinking the role of intermediate fine-tuning in the STILTs pipeline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

PerPLM: Personalized Fine-tuning of Pretrained Language Models via Writer-specific Intermediate Learning and Prompts

The meanings of words and phrases depend not only on where they are used...
research
05/01/2020

Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?

While pretrained models such as BERT have shown large gains across natur...
research
10/12/2022

Can Pretrained Language Models (Yet) Reason Deductively?

Acquiring factual knowledge with Pretrained Language Models (PLMs) has a...
research
01/17/2023

Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers

How language models process complex input that requires multiple steps o...
research
02/11/2023

Divergence-Based Domain Transferability for Zero-Shot Classification

Transferring learned patterns from pretrained neural language models has...
research
08/29/2023

TaskLAMA: Probing the Complex Task Understanding of Language Models

Structured Complex Task Decomposition (SCTD) is the problem of breaking ...
research
05/25/2022

Teaching Broad Reasoning Skills via Decomposition-Guided Contexts

Question-answering datasets require a broad set of reasoning skills. We ...

Please sign up or login with your details

Forgot password? Click here to reset