Efficient Training of Neural Transducer for Speech Recognition

04/22/2022
by   Wei Zhou, et al.
0

As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-Transducer has achieved evolving performance with more and more sophisticated neural network models of growing size and increasing training epochs. While strong computation resources seem to be the prerequisite of training superior models, we try to overcome it by carefully designing a more efficient training pipeline. In this work, we propose an efficient 3-stage progressive training pipeline to build highly-performing neural transducer models from scratch with very limited computation resources in a reasonable short time period. The effectiveness of each stage is experimentally verified on both Librispeech and Switchboard corpora. The proposed pipeline is able to train transducer models approaching state-of-the-art performance with a single GPU in just 2-3 weeks. Our best conformer transducer achieves 4.1 epochs of training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2023

Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think

Building competitive hybrid hidden Markov model (HMM) systems for automa...
research
10/29/2019

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

Sequence-to-Sequence (S2S) models recently started to show state-of-the-...
research
07/24/2017

Exploring Neural Transducers for End-to-End Speech Recognition

In this work, we perform an empirical comparison among the CTC, RNN-Tran...
research
12/05/2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

Sequence-to-sequence models provide a simple and elegant solution for bu...
research
06/16/2017

An online sequence-to-sequence model for noisy speech recognition

Generative models have long been the dominant approach for speech recogn...
research
07/21/2017

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

Unsupervised single-channel overlapped speech recognition is one of the ...
research
02/07/2020

Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning

As neural networks are increasingly employed in machine learning practic...

Please sign up or login with your details

Forgot password? Click here to reset