Tied Reduced RNN-T Decoder

09/15/2021
by   Rami Botros, et al.
0

Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003.07705 [eess.AS], [2], arXiv:2012.06749 [cs.CL]). This is done by limiting the context size of previous labels and/or using a simpler architecture for its layers instead of LSTMs. The benefits of such changes include reduction in model size, faster inference and power savings, which are all useful for on-device applications. In this work, we study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer (a.k.a. weight tying, commonly used in language modeling arXiv:1611.01462 [cs.LG]). This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2020

Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

Recently, language identity information has been utilized to improve the...
research
01/06/2020

Character-Aware Attention-Based End-to-End Speech Recognition

Predicting words and subword units (WSUs) as the output has shown to be ...
research
10/29/2022

Accelerating RNN-T Training and Inference Using CTC guidance

We propose a novel method to accelerate training and inference process o...
research
10/06/2017

Low-Rank RNN Adaptation for Context-Aware Language Modeling

A context-aware language model uses location, user and/or domain metadat...
research
12/20/2016

Dynamic Action Recognition: A convolutional neural network model for temporally organized joint location data

Motivation: Recognizing human actions in a video is a challenging task w...
research
12/12/2020

Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging

End-to-end models that condition the output label sequence on all previo...
research
05/21/2017

Recurrent Additive Networks

We introduce recurrent additive networks (RANs), a new gated RNN which i...

Please sign up or login with your details

Forgot password? Click here to reset