KNIFE: Knowledge Distillation with Free-Text Rationales

12/19/2022
by   Aaron Chan, et al.
0

Free-text rationales (FTRs) follow how humans communicate by explaining reasoning processes via natural language. A number of recent works have studied how to improve language model (LM) generalization by using FTRs to teach LMs the correct reasoning processes behind correct task outputs. These prior works aim to learn from FTRs by appending them to the LM input or target output, but this may introduce an input distribution shift or conflict with the task objective, respectively. We propose KNIFE, which distills FTR knowledge from an FTR-augmented teacher LM (takes both task input and FTR) to a student LM (takes only task input), which is used for inference. Crucially, the teacher LM's forward computation has a bottleneck stage in which all of its FTR states are masked out, which pushes knowledge from the FTR states into the task input/output states. Then, FTR knowledge is distilled to the student LM by training its task input/output states to align with the teacher LM's. On two question answering datasets, we show that KNIFE significantly outperforms existing FTR learning methods, in both fully-supervised and low-resource settings.

READ FULL TEXT

page 1

page 4

research
01/09/2023

ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Task-agnostic knowledge distillation attempts to address the problem of ...
research
08/02/2020

Differentiable Feature Aggregation Search for Knowledge Distillation

Knowledge distillation has become increasingly important in model compre...
research
10/22/2022

Hard Gate Knowledge Distillation – Leverage Calibration for Robust and Reliable Language Model

In knowledge distillation, a student model is trained with supervisions ...
research
10/05/2022

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Explainable question answering systems should produce not only accurate ...
research
05/16/2021

Undistillable: Making A Nasty Teacher That CANNOT teach students

Knowledge Distillation (KD) is a widely used technique to transfer knowl...
research
11/03/2022

PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales

Neural language models (LMs) have achieved impressive results on various...
research
06/06/2023

An Approach to Solving the Abstraction and Reasoning Corpus (ARC) Challenge

We utilise the power of Large Language Models (LLMs), in particular GPT4...

Please sign up or login with your details

Forgot password? Click here to reset