Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

11/28/2019
by   Chao Weng, et al.
0

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition. Specifically, initialized with a RNN-T trained model, MBR training is conducted via minimizing the expected edit distance between the reference label sequence and on-the-fly generated N-best hypothesis. We also introduce a heuristic to incorporate an external neural network language model (NNLM) in RNN-T beam search decoding and explore MBR training with the external NNLM. Experimental results demonstrate an MBR trained model outperforms a RNN-T trained model substantially and further improvements can be achieved if trained with an external NNLM. Our best MBR trained system achieves absolute character error rate (CER) reductions of 1.2 and 0.5 convolution and transformer based RNN-T baseline trained on  21,000 hours of speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2020

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

In this work, we propose a novel and efficient minimum word error rate (...
research
10/23/2020

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...
research
05/19/2020

A New Training Pipeline for an Improved Neural Transducer

The RNN transducer is a promising end-to-end model candidate. We compare...
research
03/17/2021

Advancing RNN Transducer Technology for Speech Recognition

We investigate a set of techniques for RNN Transducers (RNN-Ts) that wer...
research
12/07/2022

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

Recently, RNN-Transducers have achieved remarkable results on various au...
research
03/31/2022

An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Utilizing text-only data with an external language model (LM) in end-to-...
research
07/17/2023

TST: Time-Sparse Transducer for Automatic Speech Recognition

End-to-end model, especially Recurrent Neural Network Transducer (RNN-T)...

Please sign up or login with your details

Forgot password? Click here to reset