Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla

10/24/2022
by   Ahnaf Mozib Samin, et al.
0

The performance of data-driven natural language processing systems is contingent upon the quality of corpora. However, principal corpus design criteria are often not identified and examined adequately, particularly in the speech processing discipline. Speech corpora development requires additional attention with regard to clean/noisy, read/spontaneous, multi-talker speech, accents/dialects, etc. Domain selection is also a crucial decision point in speech corpus development. In this study, we demonstrate the significance of domain selection by assessing a state-of-the-art Bangla automatic speech recognition (ASR) model on a novel multi-domain Bangladeshi Bangla ASR evaluation benchmark - BanSpeech, which contains 7.2 hours of speech and 9802 utterances from 19 distinct domains. The ASR model has been trained with deep convolutional neural network (CNN), layer normalization technique, and Connectionist Temporal Classification (CTC) loss criterion on SUBAK.KO, a mostly read speech corpus for the low-resource and morphologically rich language Bangla. Experimental evaluation reveals the ASR model on SUBAK.KO faces difficulty recognizing speech from domains with mostly spontaneous speech and has a high number of out-of-vocabulary (OOV) words. The same ASR model, on the other hand, performs better in read speech domains and contains fewer OOV words. In addition, we report the outcomes of our experiments with layer normalization, input feature extraction, number of convolutional layers, etc., and set a baseline on SUBAK.KO. The BanSpeech will be publicly available to meet the need for a challenging evaluation benchmark for Bangla ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2020

FT Speech: Danish Parliament Speech Corpus

This paper introduces FT Speech, a new speech corpus created from the re...
research
02/18/2021

Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition

Automatic speech recognition (ASR) systems for young children are needed...
research
03/06/2022

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Automatic speech recognition (ASR) on low resource languages improves t...
research
10/14/2021

CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Automatic Speech recognition (ASR) is a complex and challenging task. In...
research
06/01/2023

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

This paper presents a novel algorithm for building an automatic speech r...
research
03/09/2023

Unsupervised Language agnostic WER Standardization

Word error rate (WER) is a standard metric for the evaluation of Automat...
research
09/23/2021

Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps CHiME-4 Corpora

In this study, we propose to investigate triplet loss for the purpose of...

Please sign up or login with your details

Forgot password? Click here to reset