Research on Modeling Units of Transformer Transducer for Mandarin Speech Recognition

04/26/2020
by   Li Fu, et al.
0

Modeling unit and model architecture are two key factors of Recurrent Neural Network Transducer (RNN-T) in end-to-end speech recognition. To improve the performance of RNN-T for Mandarin speech recognition task, a novel transformer transducer with the combination architecture of self-attention transformer and RNN is proposed. And then the choice of different modeling units for transformer transducer is explored. In addition, we present a new mix-bandwidth training method to obtain a general model that is able to accurately recognize Mandarin speech with different sampling rates simultaneously. All of our experiments are conducted on about 12,000 hours of Mandarin speech with sampling rate in 8kHz and 16kHz. Experimental results show that Mandarin transformer transducer using syllable with tone achieves the best performance. It yields an average of 14.4 reduction when compared with the models using syllable initial/final with tone and Chinese character, respectively. Also, it outperforms the model based on syllable initial/final with tone with an average of 13.5 Error Rate (CER) reduction.

READ FULL TEXT
research
09/28/2019

Self-Attention Transducers for End-to-End Speech Recognition

Recurrent neural network transducers (RNN-T) have been successfully appl...
research
05/10/2018

A comparable study of modeling units for end-to-end Mandarin speech recognition

End-To-End speech recognition have become increasingly popular in mandar...
research
07/29/2022

Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, ...
research
11/02/2020

Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition

Recently, several types of end-to-end speech recognition methods named t...
research
10/01/2022

A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition

Phoneme recognition is a very important part of speech recognition that ...
research
08/28/2022

Bayesian Neural Network Language Modeling for Speech Recognition

State-of-the-art neural network language models (NNLMs) represented by l...
research
07/17/2023

TST: Time-Sparse Transducer for Automatic Speech Recognition

End-to-end model, especially Recurrent Neural Network Transducer (RNN-T)...

Please sign up or login with your details

Forgot password? Click here to reset