DeepAI
Log In Sign Up

Alignment Restricted Streaming Recurrent Neural Network Transducer

11/05/2020
by   Jay Mahadeokar, et al.
0

There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications. RNN-T is trained with a loss function that does not enforce temporal alignment of the training transcripts and audio. As a result, RNN-T models built with uni-directional long short term memory (LSTM) encoders tend to wait for longer spans of input audio, before streaming already decoded ASR tokens. In this work, we propose a modification to the RNN-T loss function and develop Alignment Restricted RNN-T (Ar-RNN-T) models, which utilize audio-text alignment information to guide the loss computation. We compare the proposed method with existing works, such as monotonic RNN-T, on LibriSpeech and in-house datasets. We show that the Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER). The Ar-RNN-T models also improve downstream applications such as the ASR End-pointing by guaranteeing token emissions within any given range of latency. Moreover, the Ar-RNN-T loss allows for bigger batch sizes and 4 times higher throughput for our LSTM model architecture, enabling faster training and convergence on GPUs.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/20/2020

Improving RNN-T ASR Accuracy Using Untranscribed Context Audio

We present a new training scheme for streaming automatic speech recognit...
04/19/2022

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

The two most popular loss functions for streaming end-to-end automatic s...
05/07/2020

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

In recent years, all-neural end-to-end approaches have obtained state-of...
11/24/2013

A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property

We present an architecture of a recurrent neural network (RNN) with a fu...
11/01/2021

Sequence Transduction with Graph-based Supervision

The recurrent neural network transducer (RNN-T) objective plays a major ...
02/01/2020

Model Extraction Attacks against Recurrent Neural Networks

Model extraction attacks are a kind of attacks in which an adversary obt...
11/29/2022

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recogn...