Lattice-based lightly-supervised acoustic model training

05/30/2019
by   Joachim Fainberg, et al.
0

In the broadcast domain there is an abundance of related text data and partial transcriptions, such as closed captions and subtitles. This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model. Current approaches to light supervision typically filter the data based on matching error rates between the transcriptions and biased decoding hypotheses. In contrast, semi-supervised training does not require matching text data, instead generating a hypothesis using a background language model. State-of-the-art semi-supervised training uses lattice-based supervision with the lattice-free MMI (LF-MMI) objective function. We propose a technique to combine inaccurate transcriptions with the lattices generated for semi-supervised training, thus preserving uncertainty in the lattice where appropriate. We demonstrate that this combined approach reduces the expected error rates over the lattices, and reduces the word error rate (WER) on a broadcast task.

READ FULL TEXT
research
06/29/2022

Improving Deliberation by Text-Only and Semi-Supervised Training

Text-only and semi-supervised training based on audio-only data has gain...
research
10/29/2021

Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition

Recent advances in unsupervised representation learning have demonstrate...
research
04/05/2020

Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition

We present an analysis of semi-supervised acoustic and language model tr...
research
11/05/2018

The Marchex 2018 English Conversational Telephone Speech Recognition System

In this paper, we describe recent improvements to the production Marchex...
research
06/14/2021

Using heterogeneity in semi-supervised transcription hypotheses to improve code-switched speech recognition

Modeling code-switched speech is an important problem in automatic speec...
research
04/19/2023

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale

Unpaired text and audio injection have emerged as dominant methods for i...
research
02/01/2018

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

State-of-the-art English automatic speech recognition systems typically ...

Please sign up or login with your details

Forgot password? Click here to reset