Attention-based Transducer for Online Speech Recognition

05/18/2020
by   Bin Wang, et al.
0

Recent studies reveal the potential of recurrent neural network transducer (RNN-T) for end-to-end (E2E) speech recognition. Among some most popular E2E systems including RNN-T, Attention Encoder-Decoder (AED), and Connectionist Temporal Classification (CTC), RNN-T has some clear advantages given that it supports streaming recognition and does not have frame-independency assumption. Although significant progresses have been made for RNN-T research, it is still facing performance challenges in terms of training speed and accuracy. We propose attention-based transducer with modification over RNN-T in two aspects. First, we introduce chunk-wise attention in the joint network. Second, self-attention is introduced in the encoder. Our proposed model outperforms RNN-T for both training speed and accuracy. For training, we achieves over 1.7x speedup. With 500 hours LAIX non-native English training data, attention-based transducer yields  10.6 set of over 10K hours data, our final system achieves  5.5 that trained with the best Kaldi TDNN-f recipe. After 8-bit weight quantization without WER degradation, RTF and latency drop to 0.34 0.36 and 268 409 milliseconds respectively on a single CPU core of a production server.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

Improving RNN Transducer Modeling for End-to-End Speech Recognition

In the last few years, an emerging trend in automatic speech recognition...
research
05/28/2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

Recently, there has been a strong push to transition from hybrid models ...
research
01/06/2020

Character-Aware Attention-Based End-to-End Speech Recognition

Predicting words and subword units (WSUs) as the output has shown to be ...
research
05/19/2020

A New Training Pipeline for an Improved Neural Transducer

The RNN transducer is a promising end-to-end model candidate. We compare...
research
02/23/2021

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Self-attention models have been successfully applied in end-to-end speec...
research
11/06/2019

A comparison of end-to-end models for long-form speech recognition

End-to-end automatic speech recognition (ASR) models, including both att...
research
01/02/2020

Attention based on-device streaming speech recognition with large speech corpus

In this paper, we present a new on-device automatic speech recognition (...

Please sign up or login with your details

Forgot password? Click here to reset