Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

12/05/2021
by   Jinchuan Tian, et al.
0

Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks. However, Lattice-Free Maximum Mutual Information (LF-MMI), as one of the discriminative training criteria that show superior performance in hybrid ASR systems, is rarely adopted in E2E ASR frameworks. In this work, we propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. The proposed approach shows its effectiveness on two of the most widely used E2E frameworks including Attention-Based Encoder-Decoders (AEDs) and Neural Transducers (NTs). Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements on various datasets and different E2E ASR frameworks. The best of our models achieves competitive CER of 4.1% / 4.4% on Aishell-1 dev/test set; we also achieve significant error reduction on Aishell-2 and Librispeech datasets over strong baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition

In automatic speech recognition (ASR) research, discriminative criteria ...
research
01/06/2022

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Despite the rapid progress of end-to-end (E2E) automatic speech recognit...
research
07/09/2021

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

Hybrid automatic speech recognition (ASR) models are typically sequentia...
research
03/30/2017

Simplified End-to-End MMI Training and Voting for ASR

A simplified speech recognition system that uses the maximum mutual info...
research
05/18/2020

An Effective End-to-End Modeling Approach for Mispronunciation Detection

Recently, end-to-end (E2E) automatic speech recognition (ASR) systems ha...
research
12/07/2022

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

Recently, RNN-Transducers have achieved remarkable results on various au...
research
11/03/2022

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Exploiting effective target modeling units is very important and has alw...

Please sign up or login with your details

Forgot password? Click here to reset