On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

10/23/2020
by   Liang Lu, et al.
0

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion. In HAT, the blank probability and the label probability are estimated using two separate probability distributions, which provides a more accurate solution for internal LM score estimation, and thus works better when combining with an external LM. Previous work mainly focuses on HAT model training with the negative log-likelihood loss, while in this paper, we study the minimum word error rate (MWER) training of HAT – a criterion that is closer to the evaluation metric for speech recognition, and has been successfully applied to other types of end-to-end models such as sequence-to-sequence (S2S) and RNN-T models. From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models, while at the same time, improving the robustness of the model against the decoding hyper-parameters such as length normalization and decoding beam during inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2019

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transd...
research
06/04/2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

Integrating external language models (LMs) into end-to-end (E2E) models ...
research
07/27/2020

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

In this work, we propose a novel and efficient minimum word error rate (...
research
02/28/2023

A Token-Wise Beam Search Algorithm for RNN-T

Standard Recurrent Neural Network Transducers (RNN-T) decoding algorithm...
research
05/19/2020

A New Training Pipeline for an Improved Neural Transducer

The RNN transducer is a promising end-to-end model candidate. We compare...
research
12/12/2020

Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging

End-to-end models that condition the output label sequence on all previo...
research
03/30/2017

Simplified End-to-End MMI Training and Voting for ASR

A simplified speech recognition system that uses the maximum mutual info...

Please sign up or login with your details

Forgot password? Click here to reset