Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

07/29/2022
by   Peng Shen, et al.
0

For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, compared to character-based modeling units, pronunciation-based modeling units could improve the sharing of modeling units in model training but meet homophone problems. In this study, we propose to use a novel pronunciation-aware unique character encoding for building E2E RNN-T-based Mandarin ASR systems. The proposed encoding is a combination of pronunciation-base syllable and character index (CI). By introducing the CI, the RNN-T model can overcome the homophone problem while utilizing the pronunciation information for extracting modeling units. With the proposed encoding, the model outputs can be converted into the final recognition result through a one-to-one mapping. We conducted experiments on Aishell and MagicData datasets, and the experimental results showed the effectiveness of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition

As the character-based end-to-end automatic speech recognition (ASR) mod...
research
04/26/2020

Research on Modeling Units of Transformer Transducer for Mandarin Speech Recognition

Modeling unit and model architecture are two key factors of Recurrent Ne...
research
11/03/2022

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Exploiting effective target modeling units is very important and has alw...
research
11/17/2020

Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter

End-to-end models are favored in automatic speech recognition (ASR) beca...
research
06/09/2021

A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition

End-to-end (E2E) modeling is advantageous for automatic speech recogniti...
research
07/07/2021

Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers

Automatic speech recognition systems have been largely improved in the p...
research
11/04/2022

Multi-blank Transducers for Speech Recognition

This paper proposes a modification to RNN-Transducer (RNN-T) models for ...

Please sign up or login with your details

Forgot password? Click here to reset