Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

05/24/2022
by   Yuting Yang, et al.
0

The choice of modeling units affects the performance of the acoustic modeling and plays an important role in automatic speech recognition (ASR). In mandarin scenarios, the Chinese characters represent meaning but are not directly related to the pronunciation. Thus only considering the writing of Chinese characters as modeling units is insufficient to capture speech features. In this paper, we present a novel method involves with multi-level modeling units, which integrates multi-level information for mandarin speech recognition. Specifically, the encoder block considers syllables as modeling units, and the decoder block deals with character modeling units. During inference, the input feature sequences are converted into syllable sequences by the encoder block and then converted into Chinese characters by the decoder block. This process is conducted by a unified end-to-end model without introducing additional conversion models. By introducing InterCE auxiliary task, our method achieves competitive results with CER of 4.1 AISHELL-1 benchmark without a language model, using the Conformer and the Transformer backbones respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2018

A comparable study of modeling units for end-to-end Mandarin speech recognition

End-To-End speech recognition have become increasingly popular in mandar...
research
05/16/2018

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

The choice of modeling units is critical to automatic speech recognition...
research
07/13/2018

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

In this paper, we present an end-to-end automatic speech recognition sys...
research
11/03/2022

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Exploiting effective target modeling units is very important and has alw...
research
08/02/2021

Decoupling recognition and transcription in Mandarin ASR

Much of the recent literature on automatic speech recognition (ASR) is t...
research
02/02/2023

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

Homophone characters are common in tonal syllable-based languages, such ...
research
04/01/2022

Multi-sequence Intermediate Conditioning for CTC-based ASR

End-to-end automatic speech recognition (ASR) directly maps input speech...

Please sign up or login with your details

Forgot password? Click here to reset