BembaSpeech: A Speech Recognition Corpus for the Bemba Language

02/09/2021
by   Claytone Sikasote, et al.
0

We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30 Zambia. To assess its usefulness for training and testing ASR systems for Bemba, we train an end-to-end Bemba ASR system by fine-tuning a pre-trained DeepSpeech English model on the training portion of the BembaSpeech corpus. Our best model achieves a word error rate (WER) of 54.78 the corpus can be used for building ASR systems for Bemba. The corpus and models are publicly released at https://github.com/csikasote/BembaSpeech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2018

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

In this paper, we present TED-LIUM release 3 corpus dedicated to speech ...
research
10/14/2021

CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Automatic Speech recognition (ASR) is a complex and challenging task. In...
research
03/31/2023

The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR

English is the most widely spoken language in the world, used daily by m...
research
08/24/2020

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

End-to-end (E2E) automatic speech recognition (ASR) systems lack the dis...
research
08/31/2018

AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale

AISHELL-1 is by far the largest open-source speech corpus available for ...
research
05/06/2022

Hearing voices at the National Library – a speech corpus and acoustic model for the Swedish language

This paper explains our work in developing new acoustic models for autom...
research
03/31/2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

Transcribing meetings containing overlapped speech with only a single di...

Please sign up or login with your details

Forgot password? Click here to reset