Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping

06/19/2022
by   Jenthe Thienpondt, et al.
0

Automatic Speech Recognition (ASR) systems are known to exhibit difficulties when transcribing children's speech. This can mainly be attributed to the absence of large children's speech corpora to train robust ASR models and the resulting domain mismatch when decoding children's speech with systems trained on adult data. In this paper, we propose multiple enhancements to alleviate these issues. First, we propose a data augmentation technique based on the source-filter model of speech to close the domain gap between adult and children's speech. This enables us to leverage the data availability of adult speech corpora by making these samples perceptually similar to children's speech. Second, using this augmentation strategy, we apply transfer learning on a Transformer model pre-trained on adult data. This model follows the recently introduced XLS-R architecture, a wav2vec 2.0 model pre-trained on several cross-lingual adult speech corpora to learn general and robust acoustic frame-level representations. Adopting this model for the ASR task using adult data augmented with the proposed source-filter warping strategy and a limited amount of in-domain children's speech significantly outperforms previous state-of-the-art results on the PF-STAR British English Children's Speech corpus with a 4.86

READ FULL TEXT
research
09/12/2023

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Recent advancements in Automatic Speech Recognition (ASR) systems, exemp...
research
02/18/2021

Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition

Automatic speech recognition (ASR) systems for young children are needed...
research
05/08/2018

Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations

Children speech recognition is challenging mainly due to the inherent hi...
research
05/18/2020

The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge

This paper describes the NTNU ASR system participating in the Interspeec...
research
02/19/2022

LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects

This paper proposes a novel linear prediction coding-based data aug-ment...
research
10/08/2020

Analysis of Disfluency in Children's Speech

Disfluencies are prevalent in spontaneous speech, as shown in many studi...
research
02/24/2022

Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition

Children's automatic speech recognition (ASR) is always difficult due to...

Please sign up or login with your details

Forgot password? Click here to reset