Pseudo Label Is Better Than Human Label

03/22/2022
by   Dongseong Hwang, et al.
0

State-of-the-art automatic speech recognition (ASR) systems are trained with tens of thousands of hours of labeled speech data. Human transcription is expensive and time consuming. Factors such as the quality and consistency of the transcription can greatly affect the performance of the ASR models trained with these data. In this paper, we show that we can train a strong teacher model to produce high quality pseudo labels by utilizing recent self-supervised and semi-supervised learning techniques. Specifically, we use JUST (Joint Unsupervised/Supervised Training) and iterative noisy student teacher training to train a 600 million parameter bi-directional teacher model. This model achieved 4.0 better than a baseline. We further show that by using this strong teacher model to generate high-quality pseudo labels for training, we can achieve 13.6 relative WER reduction (5.9 human labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2022

Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing

Consistency regularization has recently been applied to semi-supervised ...
research
03/09/2021

Contrastive Semi-supervised Learning for ASR

Pseudo-labeling is the most adopted method for pre-training automatic sp...
research
06/14/2021

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

In this paper, we introduce the Kaizen framework that uses a continuousl...
research
11/06/2017

Improved training for online end-to-end speech recognition systems

Achieving high accuracy with end-to-end speech recognizers requires care...
research
03/31/2023

A Benchmark Generative Probabilistic Model for Weak Supervised Learning

Finding relevant and high-quality datasets to train machine learning mod...
research
10/28/2022

Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition

Fine tuning self supervised pretrained models using pseudo labels can ef...
research
12/27/2022

Self Meta Pseudo Labels: Meta Pseudo Labels Without The Teacher

We present Self Meta Pseudo Labels, a novel semi-supervised learning met...

Please sign up or login with your details

Forgot password? Click here to reset