Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

06/01/2023
by   Dongji Gao, et al.
0

This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) criterion. BTC explicitly encodes the uncertainties associated with transcripts during training. This is accomplished by enhancing the flexibility of the training graph, which is implemented as a weighted finite-state transducer (WFST) composition. The proposed algorithm improves the robustness and accuracy of ASR systems, particularly when working with imprecisely transcribed speech corpora. Our implementation will be open-sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2019

Continuous Speech Recognition using EEG and Video

In this paper we investigate whether electroencephalography (EEG) featur...
research
10/06/2021

CTC Variations Through New WFST Topologies

This paper presents novel Weighted Finite-State Transducer (WFST) topolo...
research
01/28/2022

Star Temporal Classification: Sequence Classification with Partially Labeled Data

We develop an algorithm which can learn from partially labeled and unseg...
research
07/04/2023

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework

Connectionist Temporal Classification (CTC) is a widely used criterion f...
research
10/18/2022

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

In this work, we compare from-scratch sequence-level cross-entropy (full...
research
04/12/2021

Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

Recent publications on automatic-speech-recognition (ASR) have a strong ...
research
10/24/2022

Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla

The performance of data-driven natural language processing systems is co...

Please sign up or login with your details

Forgot password? Click here to reset