MediaSpeech: Multilanguage ASR Benchmark and Dataset

03/30/2021
by   Rostislav Kolobov, et al.
0

The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5 freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

MLS: A Large-Scale Multilingual Dataset for Speech Research

This paper introduces Multilingual LibriSpeech (MLS) dataset, a large mu...
research
05/24/2023

Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR

Improving ASR systems is necessary to make new LLM-based use-cases acces...
research
12/23/2018

Pansori: ASR Corpus Generation from Open Online Video Contents

This paper introduces Pansori, a program used to create ASR (automatic s...
research
10/18/2022

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Training state-of-the-art Automated Speech Recognition (ASR) models typi...
research
04/22/2021

Earnings-21: A Practical Benchmark for ASR in the Wild

Commonly used speech corpora inadequately challenge academic and commerc...
research
01/14/2020

Improved Robust ASR for Social Robots in Public Spaces

Social robots deployed in public spaces present a challenging task for A...

Please sign up or login with your details

Forgot password? Click here to reset