Likelihood-based semi-supervised model selection with applications to speech processing

11/20/2009
by   Christopher M. White, et al.
0

In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically costly and difficult to obtain. This article proposes an alternative semi-supervised framework for likelihood-based model selection that leverages unlabeled data by using trained classifiers representing each model to automatically generate putative labels. The errors that result from this automatic labeling are shown to be amenable to results from robust statistics, which in turn provide for minimax-optimal censored likelihood ratio tests that recover the nonparametric sign test as a limiting case. This approach is then validated experimentally using a state-of-the-art automatic speech recognition system to select between candidate word pronunciations using unlabeled speech data that only potentially contain instances of the words under test. Results provide supporting evidence for the utility of this approach, and suggest that it may also find use in other applications of machine learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2020

Iterative Pseudo-Labeling for Speech Recognition

Pseudo-labeling has recently shown promise in end-to-end automatic speec...
research
10/20/2020

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

We employ a combination of recent developments in semi-supervised learni...
research
08/28/2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

In recent years, speech-based self-supervised learning (SSL) has made si...
research
02/24/2020

Semi-Supervised Speech Recognition via Local Prior Matching

For sequence transduction tasks like speech recognition, a strong struct...
research
05/19/2020

Improved Noisy Student Training for Automatic Speech Recognition

Recently, a semi-supervised learning method known as "noisy student trai...
research
01/26/2020

An interpretable semi-supervised classifier using two different strategies for amended self-labeling

In the context of some machine learning applications, obtaining data ins...
research
08/26/2021

Consistent Relative Confidence and Label-Free Model Selection for Convolutional Neural Networks

This paper is concerned with image classification based on deep convolut...

Please sign up or login with your details

Forgot password? Click here to reset