Continuous Soft Pseudo-Labeling in ASR

11/11/2022
by   Tatiana Likhomanenko, et al.
0

Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition. In contrast with earlier strategies that alternated between training a model and generating pseudo-labels (PLs) with it, here PLs are generated in end-to-end manner as training proceeds, improving training speed and the accuracy of the final model. PL shares a common theme with teacher-student models such as distillation in that a teacher model generates targets that need to be mimicked by the student model being trained. However, interestingly, PL strategies in general use hard-labels, whereas distillation uses the distribution over labels as the target to mimic. Inspired by distillation we expect that specifying the whole distribution (aka soft-labels) over sequences as the target for unlabeled data, instead of a single best pass pseudo-labeled transcript (hard-labels) should improve PL performance and convergence. Surprisingly and unexpectedly, we find that soft-labels targets can lead to training divergence, with the model collapsing to a degenerate token distribution per frame. We hypothesize that the reason this does not happen with hard-labels is that training loss on hard-labels imposes sequence-level consistency that keeps the model from collapsing to the degenerate solution. In this paper, we show several experiments that support this hypothesis, and experiment with several regularization approaches that can ameliorate the degenerate collapse when using soft-labels. These approaches can bring the accuracy of soft-labels closer to that of hard-labels, and while they are unable to outperform them yet, they serve as a useful framework for further improvements.

READ FULL TEXT

page 12

page 13

page 15

page 16

research
06/19/2021

Humble Teachers Teach Better Students for Semi-Supervised Object Detection

We propose a semi-supervised approach for contemporary object detectors ...
research
06/12/2018

Improving Regression Performance with Distributional Losses

There is growing evidence that converting targets to soft targets in sup...
research
09/20/2020

Learning Soft Labels via Meta Learning

One-hot labels do not represent soft decision boundaries among concepts,...
research
07/27/2020

Semi-Supervised Learning with Data Augmentation for End-to-End ASR

In this paper, we apply Semi-Supervised Learning (SSL) along with Data A...
research
02/13/2021

Distilling Double Descent

Distillation is the technique of training a "student" model based on exa...
research
10/17/2022

Continuous Pseudo-Labeling from the Start

Self-training (ST), or pseudo-labeling has sparked significant interest ...
research
11/10/2022

Estimating Soft Labels for Out-of-Domain Intent Detection

Out-of-Domain (OOD) intent detection is important for practical dialog s...

Please sign up or login with your details

Forgot password? Click here to reset