Efficient Knowledge Distillation for RNN-Transducer Models

11/11/2020

∙

Knowledge Distillation is an effective method of transferring knowledge from a large model to a smaller model. Distillation can be viewed as a type of model compression, and has played an important role for on-device ASR applications. In this paper, we develop a distillation method for RNN-Transducer (RNN-T) models, a popular end-to-end neural network architecture for streaming speech recognition. Our proposed distillation loss is simple and efficient, and uses only the "y" and "blank" posterior probabilities from the RNN-T output probability lattice. We study the effectiveness of the proposed approach in improving the accuracy of sparse RNN-T models obtained by gradually pruning a larger uncompressed model, which also serves as the teacher during distillation. With distillation of 60 models, we obtain WER reductions of 4.3 FarField eval set. We also present results of experiments on LibriSpeech, where the introduction of the distillation loss yields a 4.8 on the test-other dataset for a small Conformer model.

READ FULL TEXT

Efficient Knowledge Distillation for RNN-Transducer Models

Sign in with Google

Consider DeepAI Pro