Improved Noisy Student Training for Automatic Speech Recognition

05/19/2020
by   Daniel S. Park, et al.
0

Recently, a semi-supervised learning method known as "noisy student training" has been shown to improve image classification performance of deep networks significantly. Noisy student training is an iterative self-training method that leverages augmentation to improve network performance. In this work, we adapt and improve noisy student training for automatic speech recognition, employing (adaptive) SpecAugment as the augmentation method. We find effective methods to filter, balance and augment the data generated in between self-training iterations. By doing so, we are able to obtain word error rates (WERs) 4.2 subset of LibriSpeech as the supervised set and the rest (860h) as the unlabeled set. Furthermore, we are able to achieve WERs 1.7 clean/noisy LibriSpeech test sets by using the unlab-60k subset of LibriLight as the unlabeled set for LibriSpeech 960h. We are thus able to improve upon the previous state-of-the-art clean/noisy test WERs achieved on LibriSpeech 100h (4.74

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2020

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

We employ a combination of recent developments in semi-supervised learni...
research
11/09/2022

Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

Noisy Student Training (NST) has recently demonstrated extremely strong ...
research
09/15/2021

Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning

Recently self-supervised learning has emerged as an effective approach t...
research
12/01/2020

Semi-Supervised Noisy Student Pre-training on EfficientNet Architectures for Plant Pathology Classification

In recent years, deep learning has vastly improved the identification an...
research
01/05/2019

Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

For real-world speech recognition applications, noise robustness is stil...
research
11/20/2009

Likelihood-based semi-supervised model selection with applications to speech processing

In conventional supervised pattern recognition tasks, model selection is...
research
05/10/2021

Voice activity detection in the wild: A data-driven approach using teacher-student training

Voice activity detection is an essential pre-processing component for sp...

Please sign up or login with your details

Forgot password? Click here to reset