What's in your Head? Emergent Behaviour in Multi-Task Transformer Models

04/13/2021
by   Mor Geva, et al.
0

The primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given an input, a target head is the head that is selected for outputting the final prediction. In this work, we examine the behaviour of non-target heads, that is, the output of heads when given input that belongs to a different task than the one they were trained for. We find that non-target heads exhibit emergent behaviour, which may either explain the target task, or generalize beyond their original task. For example, in a numerical reasoning task, a span extraction head extracts from the input the arguments to a computation that results in a number generated by a target generative head. In addition, a summarization head that is trained with a target question answering head, outputs query-based summaries when given a question and a context from which the answer is to be extracted. This emergent behaviour suggests that multi-task training leads to non-trivial extrapolation of skills, which can be harnessed for interpretability and generalization.

READ FULL TEXT
research
04/13/2021

Structural analysis of an all-purpose question answering model

Attention is a key component of the now ubiquitous pre-trained language ...
research
05/22/2023

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Multi-query attention (MQA), which only uses a single key-value head, dr...
research
10/26/2022

DyREx: Dynamic Query Representation for Extractive Question Answering

Extractive question answering (ExQA) is an essential task for Natural La...
research
07/12/2022

A Novel DeBERTa-based Model for Financial Question Answering Task

As a rising star in the field of natural language processing, question a...
research
05/19/2022

Are Prompt-based Models Clueless?

Finetuning large pre-trained language models with a task-specific head h...
research
05/23/2022

On Measuring Social Biases in Prompt-Based Multi-Task Learning

Large language models trained on a mixture of NLP tasks that are convert...

Please sign up or login with your details

Forgot password? Click here to reset