Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging

12/12/2020
by   Rohit Prabhavalkar, et al.
5

End-to-end models that condition the output label sequence on all previously predicted labels have emerged as popular alternatives to conventional systems for automatic speech recognition (ASR). Since unique label histories correspond to distinct models states, such models are decoded using an approximate beam-search process which produces a tree of hypotheses. In this work, we study the influence of the amount of label context on the model's accuracy, and its impact on the efficiency of the decoding process. We find that we can limit the context of the recurrent neural network transducer (RNN-T) during training to just four previous word-piece labels, without degrading word error rate (WER) relative to the full-context baseline. Limiting context also provides opportunities to improve the efficiency of the beam-search process during decoding by removing redundant paths from the active beam, and instead retaining them in the final lattice. This path-merging scheme can also be applied when decoding the baseline full-context model through an approximation. Overall, we find that the proposed path-merging scheme is extremely effective allowing us to improve oracle WERs by up to 36 baseline, while simultaneously reducing the number of model evaluations by up to 5.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Beam search, which is the dominant ASR decoding algorithm for end-to-end...
research
02/10/2020

Accelerating RNN Transducer Inference via One-Step Constrained Beam Search

We propose a one-step constrained (OSC) beam search to accelerate recurr...
research
02/28/2023

A Token-Wise Beam Search Algorithm for RNN-T

Standard Recurrent Neural Network Transducers (RNN-T) decoding algorithm...
research
11/05/2019

RNN-T For Latency Controlled ASR With Improved Beam Search

Neural transducer-based systems such as RNN Transducers (RNN-T) for auto...
research
10/23/2020

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...
research
10/18/2021

Efficient Sequence Training of Attention Models using Approximative Recombination

Sequence discriminative training is a great tool to improve the performa...
research
09/15/2021

Tied Reduced RNN-T Decoder

Previous works on the Recurrent Neural Network-Transducer (RNN-T) models...

Please sign up or login with your details

Forgot password? Click here to reset