Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization

04/08/2019
by   Yangyang Shi, et al.
0

Recurrent Neural Networks (RNNs) have dominated language modeling because of their superior performance over traditional N-gram based models. In many applications, a large Recurrent Neural Network language model (RNNLM) or an ensemble of several RNNLMs is used. These models have large memory footprints and require heavy computation. In this paper, we examine the effect of applying knowledge distillation in reducing the model size for RNNLMs. In addition, we propose a trust regularization method to improve the knowledge distillation training for RNNLMs. Using knowledge distillation with trust regularization, we reduce the parameter size to a third of that of the previously published best model while maintaining the state-of-the-art perplexity result on Penn Treebank data. In a speech recognition N-bestrescoring task, we reduce the RNNLM model size to 18.5 rate(WER) performance on Wall Street Journal data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2022

Knowledge Distillation Applied to Optical Channel Equalization: Solving the Parallelization Problem of Recurrent Connection

To circumvent the non-parallelizability of recurrent neural network-base...
research
10/24/2019

An Empirical Study of Efficient ASR Rescoring with Transformers

Neural language models (LMs) have been proved to significantly outperfor...
research
10/25/2021

Distributionally Robust Recurrent Decoders with Random Network Distillation

Neural machine learning models can successfully model language that is s...
research
03/10/2023

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

This work studies knowledge distillation (KD) and addresses its constrai...
research
12/17/2020

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

We formally study how Ensemble of deep learning models can improve test ...
research
09/09/2020

On the Orthogonality of Knowledge Distillation with Other Techniques: From an Ensemble Perspective

To put a state-of-the-art neural network to practical use, it is necessa...
research
12/01/2016

In Teacher We Trust: Learning Compressed Models for Pedestrian Detection

Deep convolutional neural networks continue to advance the state-of-the-...

Please sign up or login with your details

Forgot password? Click here to reset