Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

05/12/2023
by   Emma O'Neill, et al.
0

Automatic Speech Recognition (ASR) systems exhibit the best performance on speech that is similar to that on which it was trained. As such, underrepresented varieties including regional dialects, minority-speakers, and low-resource languages, see much higher word error rates (WERs) than those varieties seen as 'prestigious', 'mainstream', or 'standard'. This can act as a barrier to incorporating ASR technology into the annotation process for large-scale linguistic research since the manual correction of the erroneous automated transcripts can be just as time and resource consuming as manual transcriptions. A deeper understanding of the behaviour of an ASR system is thus beneficial from a speech technology standpoint, in terms of improving ASR accuracy, and from an annotation standpoint, where knowing the likely errors made by an ASR system can aid in this manual correction. This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes. Specifically, how particular phonetic realisations which were rare or absent in the system's training data can lead to phoneme level misrecognitions and contribute to higher WERs. It is demonstrated that the behaviour of the ASR is systematic and consistent across speakers with similar spoken varieties (in this case the same L1) and phoneme substitution errors are typically in agreement with human annotators. By identifying problematic productions specific weaknesses can be addressed by sourcing such realisations for training and fine-tuning thus making the system more robust to pronunciation variation.

READ FULL TEXT
research
05/18/2023

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

The performance of automatic speech recognition (ASR) systems has advanc...
research
07/14/2023

Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition

While Automatic Speech Recognition (ASR) models have shown significant a...
research
01/16/2023

Using Kaldi for Automatic Speech Recognition of Conversational Austrian German

As dialogue systems are becoming more and more interactional and social,...
research
06/09/2021

Unsupervised Automatic Speech Recognition: A Review

Automatic Speech Recognition (ASR) systems can be trained to achieve rem...
research
02/05/2023

MAC: A unified framework boosting low resource automatic speech recognition

We propose a unified framework for low resource automatic speech recogni...
research
02/25/2022

Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR

Despite the fact that variation is a fundamental characteristic of natur...
research
09/03/2019

Automatic Speech Recognition Services: Deaf and Hard-of-Hearing Usability

Nowadays, speech is becoming a more common, if not standard, interface t...

Please sign up or login with your details

Forgot password? Click here to reset