Self-Knowledge Distillation in Natural Language Processing

08/02/2019
by   Sangchul Hahn, et al.
6

Since deep learning became a key player in natural language processing (NLP), many deep learning models have been showing remarkable performances in a variety of NLP tasks, and in some cases, they are even outperforming humans. Such high performance can be explained by efficient knowledge representation of deep learning models. While many methods have been proposed to learn more efficient representation, knowledge distillation from pretrained deep networks suggest that we can use more information from the soft target probability to train other neural networks. In this paper, we propose a new knowledge distillation method self-knowledge distillation, based on the soft target probabilities of the training model itself, where multimode information is distilled from the word embedding space right below the softmax layer. Due to the time complexity, our method approximates the soft target probabilities. In experiments, we applied the proposed method to two different and fundamental NLP tasks: language model and neural machine translation. The experiment results show that our proposed method improves performance on the tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2020

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

Knowledge Distillation (KD) is a common knowledge transfer algorithm use...
research
02/28/2020

TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

In this paper, we introduce TextBrewer, an open-source knowledge distill...
research
10/20/2021

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

The remarkable performance of the pre-trained language model (LM) using ...
research
12/19/2022

Multi-View Knowledge Distillation from Crowd Annotations for Out-of-Domain Generalization

Selecting an effective training signal for tasks in natural language pro...
research
07/20/2021

Learning ULMFiT and Self-Distillation with Calibration for Medical Dialogue System

A medical dialogue system is essential for healthcare service as providi...
research
09/06/2023

A deep Natural Language Inference predictor without language-specific training data

In this paper we present a technique of NLP to tackle the problem of inf...
research
01/12/2023

A Cohesive Distillation Architecture for Neural Language Models

A recent trend in Natural Language Processing is the exponential growth ...

Please sign up or login with your details

Forgot password? Click here to reset