Robust Speech Recognition via Large-Scale Weak Supervision

12/06/2022
by   Alec Radford, et al.
4

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

READ FULL TEXT

page 4

page 9

page 27

research
06/15/2022

Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

In this paper, we present our progress in pretraining Czech monolingual ...
research
06/05/2023

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Whisper, the recently developed multilingual weakly supervised model, is...
research
12/17/2019

Libri-Light: A Benchmark for ASR with Limited or No Supervision

We introduce a new collection of spoken English audio suitable for train...
research
05/18/2023

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

We investigate the emergent abilities of the recently proposed web-scale...
research
10/03/2022

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Data-driven speech processing models usually perform well with a large a...
research
11/04/2022

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

In this paper, we introduce our work of building a Streaming Multilingua...
research
10/12/2022

SQuId: Measuring Speech Naturalness in Many Languages

Much of text-to-speech research relies on human evaluation, which incurs...

Please sign up or login with your details

Forgot password? Click here to reset