HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

10/18/2022
by   Tina Raissi, et al.
0

In this work, we compare from-scratch sequence-level cross-entropy (full-sum) training of Hidden Markov Model (HMM) and Connectionist Temporal Classification (CTC) topologies for automatic speech recognition (ASR). Besides accuracy, we further analyze their capability for generating high-quality time alignment between the speech signal and the transcription, which can be crucial for many subsequent applications. Moreover, we propose several methods to improve convergence of from-scratch full-sum training by addressing the alignment modeling issue. Systematic comparison is conducted on both Switchboard and LibriSpeech corpora across CTC, posterior HMM with and w/o transition probabilities, and standard hybrid HMM. We also provide a detailed analysis of both Viterbi forced-alignment and Baum-Welch full-sum occupation probabilities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2023

Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think

Building competitive hybrid hidden Markov model (HMM) systems for automa...
research
06/01/2023

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

This paper presents a novel algorithm for building an automatic speech r...
research
12/22/2022

Alignment Entropy Regularization

Existing training criteria in automatic speech recognition(ASR) permit t...
research
09/18/2023

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

The possibility of dynamically modifying the computational load of neura...
research
02/03/2022

Improving Lyrics Alignment through Joint Pitch Detection

In recent years, the accuracy of automatic lyrics alignment methods has ...
research
08/19/2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Automatic speech recognition (ASR) based on transducers is widely used. ...
research
11/28/2017

Exploiting Nontrivial Connectivity for Automatic Speech Recognition

Nontrivial connectivity has allowed the training of very deep networks b...

Please sign up or login with your details

Forgot password? Click here to reset