A Treatise On FST Lattice Based MMI Training

10/17/2022
by   Adnan Haider, et al.
0

Maximum mutual information (MMI) has become one of the two de facto methods for sequence-level training of speech recognition acoustic models. This paper aims to isolate, identify and bring forward the implicit modelling decisions induced by the design implementation of standard finite state transducer (FST) lattice based MMI training framework. The paper particularly investigates the necessity to maintain a preselected numerator alignment and raises the importance of determinizing FST denominator lattices on the fly. The efficacy of employing on the fly FST lattice determinization is mathematically shown to guarantee discrimination at the hypothesis level and is empirically shown through training deep CNN models on a 18K hours Mandarin dataset and on a 2.8K hours English dataset. On assistant and dictation tasks, the approach achieves between 2.3-4.6 based approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2018

A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition

Deep Feedforward Sequential Memory Network (DFSMN) has shown superior pe...
research
07/01/2019

Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR

Sequence discriminative training criteria have long been a standard tool...
research
03/29/2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition

In automatic speech recognition (ASR) research, discriminative criteria ...
research
07/12/2019

Pykaldi2: Yet another speech toolkit based on Kaldi and Pytorch

We introduce PyKaldi2 speech recognition toolkit implemented based on Ka...
research
11/08/2018

A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-Trained Neural Network Acoustic Models

In this work, three lattice-free (LF) discriminative training criteria f...
research
10/23/2020

Speech Activity Detection Based on Multilingual Speech Recognition System

To better model the contextual information and increase the generalizati...
research
04/25/2023

LAST: Scalable Lattice-Based Speech Modelling in JAX

We introduce LAST, a LAttice-based Speech Transducer library in JAX. Wit...

Please sign up or login with your details

Forgot password? Click here to reset