Hard Gate Knowledge Distillation – Leverage Calibration for Robust and Reliable Language Model

10/22/2022
by   Dongkyu Lee, et al.
0

In knowledge distillation, a student model is trained with supervisions from both knowledge from a teacher and observations drawn from a training data distribution. Knowledge of a teacher is considered a subject that holds inter-class relations which send a meaningful supervision to a student; hence, much effort has been put to find such knowledge to be distilled. In this paper, we explore a question that has been given little attention: "when to distill such knowledge." The question is answered in our work with the concept of model calibration; we view a teacher model not only as a source of knowledge but also as a gauge to detect miscalibration of a student. This simple and yet novel view leads to a hard gate knowledge distillation scheme that switches between learning from a teacher model and training data. We verify the gating mechanism in the context of natural language generation at both the token-level and the sentence-level. Empirical comparisons with strong baselines show that hard gate knowledge distillation not only improves model generalization, but also significantly lowers model calibration error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2021

Rethinking the Knowledge Distillation From the Perspective of Model Calibration

Recent years have witnessed dramatically improvements in the knowledge d...
research
05/22/2023

Lion: Adversarial Distillation of Closed-Source Large Language Model

The practice of transferring knowledge from a sophisticated, closed-sour...
research
04/15/2023

Teacher Network Calibration Improves Cross-Quality Knowledge Distillation

We investigate cross-quality knowledge distillation (CQKD), a knowledge ...
research
10/11/2019

Improving Generalization and Robustness with Noisy Collaboration in Knowledge Distillation

Inspired by trial-to-trial variability in the brain that can result from...
research
05/21/2022

Mapping Emulation for Knowledge Distillation

This paper formalizes the source-blind knowledge distillation problem th...
research
12/19/2022

KNIFE: Knowledge Distillation with Free-Text Rationales

Free-text rationales (FTRs) follow how humans communicate by explaining ...
research
06/07/2023

Faithful Knowledge Distillation

Knowledge distillation (KD) has received much attention due to its succe...

Please sign up or login with your details

Forgot password? Click here to reset