Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

04/20/2019
by   Xiaodong Liu, et al.
0

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Although ensemble learning can improve model performance, serving an ensemble of large DNNs such as MT-DNN can be prohibitively expensive. Here we apply the knowledge distillation method (Hinton et al., 2015) in the multi-task learning setting. For each task, we train an ensemble of different MT-DNNs (teacher) that outperforms any single model, and then train a single MT-DNN (student) via multi-task learning to distill knowledge from these ensemble teachers. We show that the distilled MT-DNN significantly outperforms the original MT-DNN on 7 out of 9 GLUE tasks, pushing the GLUE benchmark (single model) to 83.7% (1.5% absolute improvement[ Based on the GLUE leaderboard at https://gluebenchmark.com/leaderboard as of April 1, 2019.]). The code and pre-trained models will be made publicly available at https://github.com/namisan/mt-dnn.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2020

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding

We present MT-DNN, an open-source natural language understanding (NLU) t...
research
01/31/2019

Multi-Task Deep Neural Networks for Natural Language Understanding

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for ...
research
07/10/2019

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

It can be challenging to train multi-task neural networks that outperfor...
research
05/10/2021

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Knowledge distillation (KD) has recently emerged as an efficacious schem...
research
01/11/2020

Exploring and Improving Robustness of Multi Task Deep Neural Networks via Domain Agnostic Defenses

In this paper, we explore the robustness of the Multi-Task Deep Neural N...
research
10/27/2017

Generalization Tower Network: A Novel Deep Neural Network Architecture for Multi-Task Learning

Deep learning (DL) advances state-of-the-art reinforcement learning (RL)...
research
03/31/2017

Comparison of multi-task convolutional neural network (MT-CNN) and a few other methods for toxicity prediction

Toxicity analysis and prediction are of paramount importance to human he...

Please sign up or login with your details

Forgot password? Click here to reset