O-1: Self-training with Oracle and 1-best Hypothesis

08/14/2023
by   Murali Karthick Baskar, et al.
0

We introduce O-1, a new self-training objective to reduce training bias and unify training and evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum Bayes Risk (EMBR), that boosts the oracle hypothesis and can accommodate both supervised and unsupervised data. We demonstrate the effectiveness of our approach in terms of recognition on publicly available SpeechStew datasets and a large-scale, in-house data set. On Speechstew, the O-1 objective closes the gap between the actual and oracle performance by 80% relative compared to EMBR which bridges the gap by 43% relative. O-1 achieves 13% to 25% relative improvement over EMBR on the various datasets that SpeechStew comprises of, and a 12% relative gap reduction with respect to the oracle WER over EMBR training on the in-house dataset. Overall, O-1 results in a 9% relative improvement in WER over EMBR, thereby speaking to the scalability of the proposed objective for large-scale datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2021

The Acrobatics of BQP

One can fix the randomness used by a randomized algorithm, but there is ...
research
09/05/2019

PNP and All Sets in NP∪coNP Have P-Optimal Proof Systems Relative to an Oracle

As one step in a working program initiated by Pudlák [Pud17] we construc...
research
09/19/2019

Self-Training for End-to-End Speech Recognition

We revisit self-training in the context of end-to-end speech recognition...
research
07/29/2022

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

This paper proposes a new approach to perform unsupervised fine-tuning a...
research
04/06/2021

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

In this work, we investigate if the wav2vec 2.0 self-supervised pretrain...
research
12/11/2019

SpecAugment on Large Scale Datasets

Recently, SpecAugment, an augmentation scheme for automatic speech recog...
research
02/04/2020

Improving Efficiency in Large-Scale Decentralized Distributed Training

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchr...

Please sign up or login with your details

Forgot password? Click here to reset