Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative

09/15/2021
by   Lucio M. Dery, et al.
0

Pre-training, where models are trained on an auxiliary objective with abundant data before being fine-tuned on data from the downstream task, is now the dominant paradigm in NLP. In general, the pre-training step relies on little to no direct knowledge of the task on which the model will be fine-tuned, even when the end-task is known in advance. Our work challenges this status-quo of end-task agnostic pre-training. First, on three different low-resource NLP tasks from two domains, we demonstrate that multi-tasking the end-task and auxiliary objectives results in significantly better downstream task performance than the widely-used task-agnostic continued pre-training paradigm of Gururangan et al. (2020). We next introduce an online meta-learning algorithm that learns a set of multi-task weights to better balance among our multiple auxiliary objectives, achieving further improvements on end task performance and data efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2020

Task-specific Objectives of Pre-trained Language Models for Dialogue Adaptation

Pre-trained Language Models (PrLMs) have been widely used as backbones i...
research
05/27/2022

AANG: Automating Auxiliary Learning

When faced with data-starved or highly complex end-tasks, it is commonpl...
research
08/25/2021

Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral

While deep learning has been very beneficial in data-rich settings, task...
research
04/18/2021

On the Influence of Masking Policies in Intermediate Pre-training

Current NLP models are predominantly trained through a pretrain-then-fin...
research
06/22/2022

reStructured Pre-training

In this work, we try to decipher the internal connection of NLP technolo...
research
11/13/2022

Build generally reusable agent-environment interaction models

This paper tackles the problem of how to pre-train a model and make it g...
research
07/24/2023

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

Pre-training has been widely adopted in deep learning to improve model p...

Please sign up or login with your details

Forgot password? Click here to reset