Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin

06/17/2018
by   Linhao Dong, et al.
0

End-to-end models have been showing superiority in Automatic Speech Recognition (ASR). At the same time, the capacity of streaming recognition has become a growing requirement for end-to-end models. Following these trends, an encoder-decoder recurrent neural network called Recurrent Neural Aligner (RNA) has been freshly proposed and shown its competitiveness on two English ASR tasks. However, it is not clear if RNA can be further improved and applied to other spoken language. In this work, we explore the applicability of RNA in Mandarin Chinese and present four effective extensions: In the encoder, we redesign the temporal down-sampling and introduce a powerful convolutional structure. In the decoder, we utilize a regularizer to smooth the output distribution and conduct joint training with a language model. On two Mandarin Chinese conversational telephone speech recognition (MTS) datasets, our Extended-RNA obtains promising performance. Particularly, it achieves 27.7 character error rate (CER), which is superior to current state-of-the-art result on the popular HKUST task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2020

Streaming automatic speech recognition with the transformer model

Encoder-decoder based sequence-to-sequence models have demonstrated stat...
research
04/11/2020

End to End Chinese Lexical Fusion Recognition with Sememe Knowledge

In this paper, we present Chinese lexical fusion recognition, a new task...
research
08/02/2021

Decoupling recognition and transcription in Mandarin ASR

Much of the recent literature on automatic speech recognition (ASR) is t...
research
10/27/2022

SAN: a robust end-to-end ASR model architecture

In this paper, we propose a novel Siamese Adversarial Network (SAN) arch...
research
05/27/2019

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

Automatic speech recognition (ASR) system is undergoing an exciting path...
research
12/21/2022

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

The network architecture of end-to-end (E2E) automatic speech recognitio...
research
03/22/2017

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Recent work on end-to-end automatic speech recognition (ASR) has shown t...

Please sign up or login with your details

Forgot password? Click here to reset