Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

07/14/2023
by   Liam Lonergan, et al.
0

ASR systems are generally built for the spoken 'standard', and their performance declines for non-standard dialects/varieties. This is a problem for a language like Irish, where there is no single spoken standard, but rather three major dialects: Ulster (Ul), Connacht (Co) and Munster (Mu). As a diagnostic to quantify the effect of the speaker's dialect on recognition performance, 12 ASR systems were trained, firstly using baseline dialect-balanced training corpora, and then using modified versions of the baseline corpora, where dialect-specific materials were either subtracted or added. Results indicate that dialect-balanced corpora do not yield a similar performance across the dialects: the Ul dialect consistently underperforms, whereas Mu yields lowest WERs. There is a close relationship between Co and Mu dialects, but one that is not symmetrical. These results will guide future corpus collection and system building strategies to optimise for cross-dialect performance equity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2023

Towards spoken dialect identification of Irish

The Irish language is rich in its diversity of dialects and accents. Thi...
research
05/10/2021

Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages

In this paper, we conduct one of the very first studies for cross-corpor...
research
05/19/2020

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

It is important to transcribe and archive speech data of endangered lang...
research
11/12/2020

Cross-lingual and Multilingual Spoken Term Detection for Low-Resource Indian Languages

Spoken Term Detection (STD) is the task of searching for words or phrase...
research
06/01/2017

Using of heterogeneous corpora for training of an ASR system

The paper summarizes the development of the LVCSR system built as a part...
research
07/13/2023

Adapting an ASR Foundation Model for Spoken Language Assessment

A crucial part of an accurate and reliable spoken language assessment sy...

Please sign up or login with your details

Forgot password? Click here to reset