Towards spoken dialect identification of Irish

07/14/2023
by   Liam Lonergan, et al.
0

The Irish language is rich in its diversity of dialects and accents. This compounds the difficulty of creating a speech recognition system for the low-resource language, as such a system must contend with a high degree of variability with limited corpora. A recent study investigating dialect bias in Irish ASR found that balanced training corpora gave rise to unequal dialect performance, with performance for the Ulster dialect being consistently worse than for the Connacht or Munster dialects. Motivated by this, the present experiments investigate spoken dialect identification of Irish, with a view to incorporating such a system into the speech recognition pipeline. Two acoustic classification models are tested, XLS-R and ECAPA-TDNN, in conjunction with a text-based classifier using a pretrained Irish-language BERT model. The ECAPA-TDNN, particularly a model pretrained for language identification on the VoxLingua107 dataset, performed best overall, with an accuracy of 73 further improved to 76 model. The Ulster dialect was most accurately identified, with an accuracy of 94 Munster dialects, suggesting a more nuanced approach may be necessary to robustly distinguish between the dialects of Irish.

READ FULL TEXT
research
07/14/2023

Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

ASR systems are generally built for the spoken 'standard', and their per...
research
05/31/2021

Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions

This memo describes NTR/TSU winning submission for Low Resource ASR chal...
research
08/04/2021

Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Running automatic speech recognition (ASR) on edge devices is non-trivia...
research
06/11/2021

Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

We investigate the efficiency of two very different spoken term detectio...
research
03/01/2022

BERT-LID: Leveraging BERT to Improve Spoken Language Identification

Language identification is a task of automatically determining the ident...
research
09/16/2019

Fast transcription of speech in low-resource languages

We present software that, in only a few hours, transcribes forty hours o...
research
07/14/2023

SGGNet^2: Speech-Scene Graph Grounding Network for Speech-guided Navigation

The spoken language serves as an accessible and efficient interface, ena...

Please sign up or login with your details

Forgot password? Click here to reset