On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

07/09/2021
by   Xiaohui Zhang, et al.
0

Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria. However, they have vastly different legacies and are usually implemented in different frameworks. In this paper, by decoupling the concepts of modeling units and label topologies and building proper numerator/denominator graphs accordingly, we establish a generalized framework for hybrid acoustic modeling (AM). In this framework, we show that LF-MMI is a powerful training criterion applicable to both limited-context and full-context models, for wordpiece/mono-char/bi-char/chenone units, with both HMM/CTC topologies. From this framework, we propose three novel training schemes: chenone(ch)/wordpiece(wp)-CTC-bMMI, and wordpiece(wp)-HMM-bMMI with different advantages in training performance, decoding efficiency and decoding time-stamp accuracy. The advantages of different training schemes are evaluated comprehensively on Librispeech, and wp-CTC-bMMI and ch-CTC-bMMI are evaluated on two real world ASR tasks to show their effectiveness. Besides, we also show bi-char(bc) HMM-MMI models can serve as better alignment models than traditional non-neural GMM-HMMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2021

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Recently, End-to-End (E2E) frameworks have achieved remarkable results o...
research
03/29/2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition

In automatic speech recognition (ASR) research, discriminative criteria ...
research
10/06/2021

CTC Variations Through New WFST Topologies

This paper presents novel Weighted Finite-State Transducer (WFST) topolo...
research
05/28/2023

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

Modern public ASR tools usually provide rich support for training variou...
research
04/19/2021

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

Subword units are commonly used for end-to-end automatic speech recognit...
research
11/03/2022

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Exploiting effective target modeling units is very important and has alw...
research
04/06/2021

Towards Consistent Hybrid HMM Acoustic Modeling

High-performance hybrid automatic speech recognition (ASR) systems are o...

Please sign up or login with your details

Forgot password? Click here to reset