Efficient Active Learning for Automatic Speech Recognition via Augmented Consistency Regularization

06/19/2020
by   Jihwan Bang, et al.
0

The cost of labeling transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition (ASR) models. Therefore, in this paper, we present a new training scheme that minimizes the labeling cost by adopting the concepts of semi-supervised learning (SSL) and active learning (AL) approaches and making a synergy from them. While AL studies only focus on selecting minimized the number of samples to be labeled with a criterion and taking advantage of such samples, we show that the training efficiency can be further improved by utilizing the unlabeled samples by sophisticatedly designing unsupervised loss that complements the unwanted behavior of supervised loss effectively. Our unsupervised loss is built on Consistency-Regularization (CR) approach, and we propose appropriate augmentation techniques to adopt CR in ASR field successfully. From the qualitative and quantitative experiments on the real-world dataset from deployed end-user voice assistant services, we show that the proposed methods can handle a large number of unlabeled speech data to achieve competitive model performance, with a sustainable amount of human labeling cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

In recent years, speech-based self-supervised learning (SSL) has made si...
research
12/10/2016

Active Learning for Speech Recognition: the Power of Gradients

In training speech recognition systems, labeling audio clips can be expe...
research
05/19/2020

Iterative Pseudo-Labeling for Speech Recognition

Pseudo-labeling has recently shown promise in end-to-end automatic speec...
research
06/09/2021

Unsupervised Automatic Speech Recognition: A Review

Automatic Speech Recognition (ASR) systems can be trained to achieve rem...
research
04/17/2019

Hard Sample Mining for the Improved Retraining of Automatic Speech Recognition

It is an effective way that improves the performance of the existing Aut...
research
06/07/2022

Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning

While annotating decent amounts of data to satisfy sophisticated learnin...
research
03/07/2019

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models

The goal of this paper is to simulate the benefits of jointly applying a...

Please sign up or login with your details

Forgot password? Click here to reset