ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition

09/29/2022
by   Martin Radfar, et al.
0

The recurrent neural network transducer (RNN-T) is a prominent streaming end-to-end (E2E) ASR technology. In RNN-T, the acoustic encoder commonly consists of stacks of LSTMs. Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and between attention layers. In this paper, we introduce a new streaming ASR model, Convolutional Augmented Recurrent Neural Network Transducers (ConvRNN-T) in which we augment the LSTM-based RNN-T with a novel convolutional frontend consisting of local and global context CNN encoders. ConvRNN-T takes advantage of causal 1-D convolutional layers, squeeze-and-excitation, dilation, and residual blocks to provide both global and local audio context representation to LSTM layers. We show ConvRNN-T outperforms RNN-T, Conformer, and ContextNet on Librispeech and in-house data. In addition, ConvRNN-T offers less computational complexity compared to Conformer. ConvRNN-T's superior accuracy along with its low footprint make it a promising candidate for on-device streaming ASR technologies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2020

Improving RNN transducer with normalized jointer network

Recurrent neural transducer (RNN-T) is a promising end-to-end (E2E) mode...
research
03/15/2017

Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

Keyword spotting (KWS) constitutes a major component of human-technology...
research
09/29/2021

Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting

In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural ...
research
04/06/2021

Understanding Medical Conversations: Rich Transcription, Confidence Scores Information Extraction

In this paper, we describe novel components for extracting clinically re...
research
06/29/2022

On the Prediction Network Architecture in RNN-T for ASR

RNN-T models have gained popularity in the literature and in commercial ...
research
02/18/2016

Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

Traditional convolutional layers extract features from patches of data b...
research
02/03/2021

Effects of Number of Filters of Convolutional Layers on Speech Recognition Model Accuracy

Inspired by the progress of the End-to-End approach [1], this paper syst...

Please sign up or login with your details

Forgot password? Click here to reset