Integrate Lattice-Free MMI into End-to-End Speech Recognition

03/29/2022
by   Jinchuan Tian, et al.
0

In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems. Given this success, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems. With this motivation, previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems. However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds. To this end, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI), into E2E ASR systems not only in the training stage but also in the decoding process. The proposed LF-MMI training and decoding methods show their effectiveness on two widely used E2E frameworks: Attention-Based Encoder-Decoders (AEDs) and Neural Transducers (NTs). Compared with MBR-based methods, the proposed LF-MMI method: maintains the consistency between training and decoding; eschews the on-the-fly decoding process; trains from randomly initialized models with superior training efficiency. Experiments suggest that the LF-MMI method outperforms its MBR counterparts and consistently leads to statistically significant performance improvements on various frameworks and datasets from 30 hours to 14.3k hours. The proposed method achieves state-of-the-art (SOTA) results on Aishell-1 (CER 4.10

READ FULL TEXT
research
12/05/2021

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Recently, End-to-End (E2E) frameworks have achieved remarkable results o...
research
01/06/2022

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Despite the rapid progress of end-to-end (E2E) automatic speech recognit...
research
07/09/2021

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

Hybrid automatic speech recognition (ASR) models are typically sequentia...
research
10/17/2022

A Treatise On FST Lattice Based MMI Training

Maximum mutual information (MMI) has become one of the two de facto meth...
research
05/20/2020

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

We present PyChain, a fully parallelized PyTorch implementation of end-t...
research
01/14/2022

A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies

In this study, we present recent developments of models trained with the...
research
12/07/2022

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

Recently, RNN-Transducers have achieved remarkable results on various au...

Please sign up or login with your details

Forgot password? Click here to reset