A New Training Pipeline for an Improved Neural Transducer

05/19/2020
by   Albert Zeyer, et al.
0

The RNN transducer is a promising end-to-end model candidate. We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training. We also generalize from the original neural network model and study more powerful models, made possible due to the maximum approximation. We further generalize the output label topology to cover RNN-T, RNA and CTC. We perform several studies among all these aspects, including a study on the effect of external alignments. We find that the transducer model generalizes much better on longer sequences than the attention model. Our final transducer model outperforms our attention model on Switchboard 300h by over 6 relative WER.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2019

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transd...
research
05/18/2020

Attention-based Transducer for Online Speech Recognition

Recent studies reveal the potential of recurrent neural network transduc...
research
09/26/2019

Improving RNN Transducer Modeling for End-to-End Speech Recognition

In the last few years, an emerging trend in automatic speech recognition...
research
10/23/2020

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...
research
05/07/2020

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

In recent years, all-neural end-to-end approaches have obtained state-of...
research
03/15/2018

Advancing Acoustic-to-Word CTC Model

The acoustic-to-word model based on the connectionist temporal classific...

Please sign up or login with your details

Forgot password? Click here to reset