Advancing RNN Transducer Technology for Speech Recognition

03/17/2021
by   George Saon, et al.
0

We investigate a set of techniques for RNN Transducers (RNN-Ts) that were instrumental in lowering the word error rate on three different tasks (Switchboard 300 hours, conversational Spanish 780 hours and conversational Italian 900 hours). The techniques pertain to architectural changes, speaker adaptation, language model fusion, model combination and general training recipe. First, we introduce a novel multiplicative integration of the encoder and prediction network vectors in the joint network (as opposed to additive). Second, we discuss the applicability of i-vector speaker adaptation to RNN-Ts in conjunction with data perturbation. Third, we explore the effectiveness of the recently proposed density ratio language model fusion for these tasks. Last but not least, we describe the other components of our training recipe and their effect on recognition performance. We report a 5.9 rate on the Switchboard and CallHome test sets of the NIST Hub5 2000 evaluation and a 12.7

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2016

The Microsoft 2016 Conversational Speech Recognition System

We describe Microsoft's conversational speech recognition system, in whi...
research
07/15/2019

Investigation on N-gram Approximated RNNLMs for Recognition of Morphologically Rich Speech

Recognition of Hungarian conversational telephone speech is challenging ...
research
11/28/2019

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transd...
research
04/22/2021

Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network

Adaption of end-to-end speech recognition systems to new tasks is known ...
research
03/31/2022

An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Utilizing text-only data with an external language model (LM) in end-to-...
research
05/03/2021

On the limit of English conversational speech recognition

In our previous work we demonstrated that a single headed attention enco...
research
12/29/2017

The CAPIO 2017 Conversational Speech Recognition System

In this paper we show how we have achieved the state-of-the-art performa...

Please sign up or login with your details

Forgot password? Click here to reset