Improving RNN-T ASR Accuracy Using Untranscribed Context Audio

11/20/2020
by   Andreas Schwarz, et al.
0

We present a new training scheme for streaming automatic speech recognition (ASR) based on recurrent neural network transducers (RNN-T) which allows the encoder network to benefit from longer audio streams as input, while only requiring partial transcriptions of such streams during training. We show that this extension of the acoustic context during training and inference can lead to word error rate reductions of more than 6 setting. We investigate its effect on acoustically challenging data containing background speech and present data points which indicate that this approach helps the network learn both speaker and environment adaptation. Finally, we visualize RNN-T loss gradients with respect to the input features in order to illustrate the ability of a long short-term memory (LSTM) based ASR encoder to exploit long-term context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2020

Alignment Restricted Streaming Recurrent Neural Network Transducer

There is a growing interest in the speech community in developing Recurr...
research
06/22/2016

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

We present a comprehensive study of deep bidirectional long short-term m...
research
10/27/2022

Contextual-Utterance Training for Automatic Speech Recognition

Recent studies of streaming automatic speech recognition (ASR) recurrent...
research
03/01/2022

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

This study addresses robust automatic speech recognition (ASR) by introd...
research
06/14/2019

Cumulative Adaptation for BLSTM Acoustic Models

This paper addresses the robust speech recognition problem as an adaptat...
research
03/12/2021

A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training

This paper presents a novel natural gradient and Hessian-free (NGHF) opt...
research
01/25/2022

Improving the fusion of acoustic and text representations in RNN-T

The recurrent neural network transducer (RNN-T) has recently become the ...

Please sign up or login with your details

Forgot password? Click here to reset