SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

04/05/2021
by   William Chan, et al.
0

We present SpeechStew, a speech recognition model that is trained on a combination of various publicly available speech recognition datasets: AMI, Broadcast News, Common Voice, LibriSpeech, Switchboard/Fisher, Tedlium, and Wall Street Journal. SpeechStew simply mixes all of these datasets together, without any special re-weighting or re-balancing of the datasets. SpeechStew achieves SoTA or near SoTA results across a variety of tasks, without the use of an external language model. Our results include 9.0% WER on AMI-IHM, 4.7% WER on Switchboard, 8.3% WER on CallHome, and 1.3% on WSJ, which significantly outperforms prior work with strong external language models. We also demonstrate that SpeechStew learns powerful transfer learning representations. We fine-tune SpeechStew on a noisy low resource speech dataset, CHiME-6. We achieve 38.9% WER without a language model, which compares to 38.6% WER to a strong HMM baseline with a language model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2021

Pushing the Limits of Non-Autoregressive Speech Recognition

We combine recent advancements in end-to-end speech recognition to non-a...
research
09/06/2022

ASR2K: Speech Recognition for Around 2000 Languages without Audio

Most recent speech recognition models rely on large supervised datasets,...
research
07/20/2023

Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge

This paper presents a speech recognition system developed by the Transsi...
research
09/19/2021

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Unifying acoustic and linguistic representation learning has become incr...
research
01/01/2018

PronouncUR: An Urdu Pronunciation Lexicon Generator

State-of-the-art speech recognition systems rely heavily on three basic ...
research
06/23/2016

NN-grams: Unifying neural network and n-gram language models for Speech Recognition

We present NN-grams, a novel, hybrid language model integrating n-grams ...
research
03/19/2018

Acoustic feature learning cross-domain articulatory measurements

Previous work has shown that it is possible to improve speech recognitio...

Please sign up or login with your details

Forgot password? Click here to reset