Memory-Efficient Training of RNN-Transducer with Sampled Softmax

03/31/2022
by   Jaesong Lee, et al.
0

RNN-Transducer has been one of promising architectures for end-to-end automatic speech recognition. Although RNN-Transducer has many advantages including its strong accuracy and streaming-friendly property, its high memory consumption during training has been a critical problem for development. In this work, we propose to apply sampled softmax to RNN-Transducer, which requires only a small subset of vocabulary during training thus saves its memory consumption. We further extend sampled softmax to optimize memory consumption for a minibatch, and employ distributions of auxiliary CTC losses for sampling vocabulary to improve model accuracy. We present experimental results on LibriSpeech, AISHELL-1, and CSJ-APS, where sampled softmax greatly reduces memory consumption and still maintains the accuracy of the baseline model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

Improving RNN Transducer Modeling for End-to-End Speech Recognition

In the last few years, an emerging trend in automatic speech recognition...
research
07/30/2020

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

Because of its streaming nature, recurrent neural network transducer (RN...
research
05/01/2020

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

Recently, the recurrent neural network transducer (RNN-T) architecture h...
research
04/10/2020

Efficient Sampled Softmax for Tensorflow

This short paper discusses an efficient implementation of sampled softma...
research
11/13/2018

Exploring RNN-Transducer for Chinese Speech Recognition

End-to-end approaches have drawn much attention recently for significant...
research
03/18/2023

Powerful and Extensible WFST Framework for RNN-Transducer Losses

This paper presents a framework based on Weighted Finite-State Transduce...
research
11/29/2022

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recogn...

Please sign up or login with your details

Forgot password? Click here to reset