A comparable study of modeling units for end-to-end Mandarin speech recognition

05/10/2018
by   Wei Zou, et al.
0

End-To-End speech recognition have become increasingly popular in mandarin speech recognition and achieved delightful performance. Mandarin is a tonal language which is different from English and requires special treatment for the acoustic modeling units. There have been several different kinds of modeling units for mandarin such as phoneme, syllable and Chinese character. In this work, we explore two major end-to-end models: connectionist temporal classification (CTC) model and attention based encoder-decoder model for mandarin speech recognition. We compare the performance of three different scaled modeling units: context dependent phoneme(CDP), syllable with tone and Chinese character. We find that all types of modeling units can achieve approximate character error rate (CER) in CTC model and the performance of Chinese character attention model is better than syllable attention model. Furthermore, we find that Chinese character is a reasonable unit for mandarin speech recognition. On DidiCallcenter task, Chinese character attention model achieves a CER of 5.68% and CTC model gets a CER of 7.29%, on the other DidiReading task, CER are 4.89% and 5.79%, respectively. Moreover, attention model achieves a better performance than CTC model on both datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition

As the character-based end-to-end automatic speech recognition (ASR) mod...
research
05/24/2022

Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

The choice of modeling units affects the performance of the acoustic mod...
research
01/06/2020

Character-Aware Attention-Based End-to-End Speech Recognition

Predicting words and subword units (WSUs) as the output has shown to be ...
research
04/26/2020

Research on Modeling Units of Transformer Transducer for Mandarin Speech Recognition

Modeling unit and model architecture are two key factors of Recurrent Ne...
research
05/11/2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Although attention based end-to-end models have achieved promising perfo...
research
05/10/2023

Quran Recitation Recognition using End-to-End Deep Learning

The Quran is the holy scripture of Islam, and its recitation is an impor...
research
11/05/2018

Manner of Articulation Detection using Connectionist Temporal Classification to Improve Automatic Speech Recognition Performance

Conventionally, the manner of articulations in speech signal are derived...

Please sign up or login with your details

Forgot password? Click here to reset