Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers

07/07/2021
by   Huahuan Zheng, et al.
0

Automatic speech recognition systems have been largely improved in the past few decades and current systems are mainly hybrid-based and end-to-end-based. The recently proposed CTC-CRF framework inherits the data-efficiency of the hybrid approach and the simplicity of the end-to-end approach. In this paper, we further advance CTC-CRF based ASR technique with explorations on modeling units and neural architectures. Specifically, we investigate techniques to enable the recently developed wordpiece modeling units and Conformer neural networks to be succesfully applied in CTC-CRFs. Experiments are conducted on two English datasets (Switchboard, Librispeech) and a German dataset from CommonVoice. Experimental results suggest that (i) Conformer can improve the recognition performance significantly; (ii) Wordpiece-based systems perform slightly worse compared with phone-based systems for the target language with a low degree of grapheme-phoneme correspondence (e.g. English), while the two systems can perform equally strong when such degree of correspondence is high for the target language (e.g. German).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2019

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

There is an implicit assumption that traditional hybrid approaches for a...
research
11/20/2019

CAT: CRF-based ASR Toolkit

In this paper, we present a new open source toolkit for automatic speech...
research
11/02/2021

Recent Advances in End-to-End Automatic Speech Recognition

Recently, the speech community is seeing a significant trend of moving f...
research
07/29/2022

Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, ...
research
05/27/2020

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

In this paper, we present a new open source toolkit for speech recogniti...
research
07/15/2019

Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data

End-to-end (E2E) systems are fast replacing the conventional systems in ...
research
08/22/2022

Low-Level Physiological Implications of End-to-End Learning of Speech Recognition

Current speech recognition architectures perform very well from the poin...

Please sign up or login with your details

Forgot password? Click here to reset