The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

11/17/2021
by   Daniel Galvez, et al.
0

The People's Speech is a free-to-download 30,000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset). The data is collected via searching the Internet for appropriately licensed audio data with existing transcriptions. We describe our data collection methodology and release our data collection system under the Apache 2.0 license. We show that a model trained on this dataset achieves a 9.98 test-clean test set.Finally, we discuss the legal and ethical issues surrounding the creation of a sizable machine learning corpora and plans for continued maintenance of the project under MLCommons's sponsorship.

READ FULL TEXT

page 4

page 8

research
05/26/2017

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

For conversational large-vocabulary continuous speech recognition (LVCSR...
research
02/20/2021

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

The variety of accents has posed a big challenge to speech recognition. ...
research
07/02/2021

Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription

Domain-specific data is the crux of the successful transfer of machine l...
research
01/20/2021

VOTE400(Voide Of The Elderly 400 Hours): A Speech Dataset to Study Voice Interface for Elderly-Care

This paper introduces a large-scale Korean speech dataset, called VOTE40...
research
06/08/2023

Latent Phrase Matching for Dysarthric Speech

Many consumer speech recognition systems are not tuned for people with s...
research
10/17/2016

Achieving Human Parity in Conversational Speech Recognition

Conversational speech recognition has served as a flagship speech recogn...
research
08/03/2018

A Short Note about Kinetics-600

We describe an extension of the DeepMind Kinetics human action dataset f...

Please sign up or login with your details

Forgot password? Click here to reset