Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

03/02/2023
by   Yu Zhang, et al.
0

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2021

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic spee...
research
09/14/2023

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

We introduce a multilingual speaker change detection model (USM-SCD) tha...
research
12/19/2022

Mu^2SLAM: Multitask, Multilingual Speech and Language Models

We present Mu^2SLAM, a multilingual sequence-to-sequence model pre-train...
research
06/25/2022

Distilling a Pretrained Language Model to a Multilingual ASR Model

Multilingual speech data often suffer from long-tailed language distribu...
research
06/01/2023

AfriNames: Most ASR models "butcher" African Names

Useful conversational agents must accurately capture named entities to m...
research
05/18/2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderbo...
research
06/10/2023

Adversarial Training For Low-Resource Disfluency Correction

Disfluencies commonly occur in conversational speech. Speech with disflu...

Please sign up or login with your details

Forgot password? Click here to reset