Improving RNN Transducer Modeling for End-to-End Speech Recognition

09/26/2019
by   Jinyu Li, et al.
0

In the last few years, an emerging trend in automatic speech recognition research is the study of end-to-end (E2E) systems. Connectionist Temporal Classification (CTC), Attention Encoder-Decoder (AED), and RNN Transducer (RNN-T) are the most popular three methods. Among these three methods, RNN-T has the advantages to do online streaming which is challenging to AED and it doesn't have CTC's frame-independence assumption. In this paper, we improve the RNN-T training in two aspects. First, we optimize the training algorithm of RNN-T to reduce the memory consumption so that we can have larger training minibatch for faster training speed. Second, we propose better model structures so that we obtain RNN-T models with the very good accuracy but small footprint. Trained with 30 thousand hours anonymized and transcribed Microsoft production data, the best RNN-T model with even smaller model size (216 Megabytes) achieves up-to 11.8 RNN-T model. This best RNN-T model is significantly better than the device hybrid model with similar size by achieving up-to 15.0 and obtains similar WERs as the server hybrid model of 5120 Megabytes in size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2020

Attention-based Transducer for Online Speech Recognition

Recent studies reveal the potential of recurrent neural network transduc...
research
05/01/2020

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

Recently, the recurrent neural network transducer (RNN-T) architecture h...
research
03/31/2022

Memory-Efficient Training of RNN-Transducer with Sampled Softmax

RNN-Transducer has been one of promising architectures for end-to-end au...
research
11/13/2018

Exploring RNN-Transducer for Chinese Speech Recognition

End-to-end approaches have drawn much attention recently for significant...
research
12/16/2022

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition

This paper presents a class of new fast non-trainable entropy-based conf...
research
10/29/2022

Accelerating RNN-T Training and Inference Using CTC guidance

We propose a novel method to accelerate training and inference process o...
research
05/19/2020

A New Training Pipeline for an Improved Neural Transducer

The RNN transducer is a promising end-to-end model candidate. We compare...

Please sign up or login with your details

Forgot password? Click here to reset