Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation

08/26/2022
by   Zoey Liu, et al.
2

Many automatic speech recognition (ASR) data sets include a single pre-defined test set consisting of one or more speakers whose speech never appears in the training set. This "hold-speaker(s)-out" data partitioning strategy, however, may not be ideal for data sets in which the number of speakers is very small. This study investigates ten different data split methods for five languages with minimal ASR training resources. We find that (1) model performance varies greatly depending on which speaker is selected for testing; (2) the average word error rate (WER) across all held-out speakers is comparable not only to the average WER over multiple random splits but also to any given individual random split; (3) WER is also generally comparable when the data is split heuristically or adversarially; (4) utterance duration and intensity are comparatively more predictive factors of variability regardless of the data split. These results suggest that the widely used hold-speakers-out approach to ASR data partitioning can yield results that do not reflect model performance on unseen data or speakers. Random splits can yield more reliable and generalizable estimates when facing data sparsity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2020

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

It is important to transcribe and archive speech data of endangered lang...
research
10/16/2021

A Unified Speaker Adaptation Approach for ASR

Transformer models have been used in automatic speech recognition (ASR) ...
research
05/25/2021

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

The performance of speaker recognition system is highly dependent on the...
research
05/09/2019

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

Significant performance degradation of automatic speech recognition (ASR...
research
03/31/2022

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

In this paper, we conduct a comparative study on speaker-attributed auto...
research
05/05/2022

Speaker Recognition in the Wild

In this paper, we propose a pipeline to find the number of speakers, as ...
research
05/30/2023

Investigating model performance in language identification: beyond simple error statistics

Language development experts need tools that can automatically identify ...

Please sign up or login with your details

Forgot password? Click here to reset