BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

09/27/2021
by   Yu Zhang, et al.
1

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3 significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.

READ FULL TEXT
research
03/02/2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

We introduce the Universal Speech Model (USM), a single large model that...
research
02/12/2023

ASR Bundestag: A Large-Scale political debate dataset in German

We present ASR Bundestag, a dataset for automatic speech recognition in ...
research
11/13/2021

Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features

One of the key communicative competencies is the ability to maintain flu...
research
08/16/2018

Toward domain-invariant speech recognition via large scale training

Current state-of-the-art automatic speech recognition systems are traine...
research
10/31/2022

Active Learning of Non-semantic Speech Tasks with Pretrained Models

Pretraining neural networks with massive unlabeled datasets has become p...
research
06/01/2023

Some voices are too common: Building fair speech recognition systems using the Common Voice dataset

Automatic speech recognition (ASR) systems become increasingly efficient...
research
04/02/2019

Lessons from Building Acoustic Models with a Million Hours of Speech

This is a report of our lessons learned building acoustic models from 1 ...

Please sign up or login with your details

Forgot password? Click here to reset