DeepAI AI Chat
Log In Sign Up

Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages

by   Kavitha Raju, et al.

Automatic Speech Recognition (ASR) has increasing utility in the modern world. There are a many ASR models available for languages with large amounts of training data like English. However, low-resource languages are poorly represented. In response we create and release an open-licensed and formatted dataset of audio recordings of the Bible in low-resource northern Indian languages. We setup multiple experimental splits and train and analyze two competitive ASR models to serve as the baseline for future research using this data.


page 1

page 2

page 3

page 4


Automatic Speech Recognition of Low-Resource Languages Based on Chukchi

The following paper presents a project focused on the research and creat...

Unsupervised Automatic Speech Recognition: A Review

Automatic Speech Recognition (ASR) systems can be trained to achieve rem...

A Recorded Debating Dataset

This paper describes an audio and textual dataset of debating speeches, ...

A bandit approach to curriculum generation for automatic speech recognition

The Automated Speech Recognition (ASR) task has been a challenging domai...

Automated speech tools for helping communities process restricted-access corpora for language revival efforts

Many archival recordings of speech from endangered languages remain unan...