Towards Consistent Hybrid HMM Acoustic Modeling

04/06/2021
by   Tina Raissi, et al.
0

High-performance hybrid automatic speech recognition (ASR) systems are often trained with clustered triphone outputs, and thus require a complex training pipeline to generate the clustering. The same complex pipeline is often utilized in order to generate an alignment for use in frame-wise cross-entropy training. In this work, we propose a flat-start factored hybrid model trained by modeling the full set of triphone states explicitly without relying on clustering methods. This greatly simplifies the training of new models. Furthermore, we study the effect of different alignments used for Viterbi training. Our proposed models achieve competitive performance on the Switchboard task compared to systems using clustered triphones and other flat-start models in the literature.

READ FULL TEXT
research
06/15/2023

Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think

Building competitive hybrid hidden Markov model (HMM) systems for automa...
research
05/19/2020

Fast, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

In this work, we first show that on the widely used LibriSpeech benchmar...
research
02/17/2023

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

Wake word detection exists in most intelligent homes and portable device...
research
10/02/2019

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

There is an implicit assumption that traditional hybrid approaches for a...
research
10/11/2016

GMM-Free Flat Start Sequence-Discriminative DNN Training

Recently, attempts have been made to remove Gaussian mixture models (GMM...
research
01/24/2022

Improving Factored Hybrid HMM Acoustic Modeling without State Tying

In this work, we show that a factored hybrid hidden Markov model (FH-HMM...
research
07/09/2021

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

Hybrid automatic speech recognition (ASR) models are typically sequentia...

Please sign up or login with your details

Forgot password? Click here to reset