Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations

Children speech recognition is challenging mainly due to the inherent high variability in children's physical and articulatory characteristics and expressions. This variability manifests in both acoustic constructs and linguistic usage due to the rapidly changing developmental stage in children's life. Part of the challenge is due to the lack of large amounts of available children speech data for efficient modeling. This work attempts to address the key challenges using transfer learning from adult's models to children's models in a Deep Neural Network (DNN) framework for children's Automatic Speech Recognition (ASR) task evaluating on multiple children's speech corpora with a large vocabulary. The paper presents a systematic and an extensive analysis of the proposed transfer learning technique considering the key factors affecting children's speech recognition from prior literature. Evaluations are presented on (i) comparisons of earlier GMM-HMM and the newer DNN Models, (ii) effectiveness of standard adaptation techniques versus transfer learning, (iii) various adaptation configurations in tackling the variabilities present in children speech, in terms of (a) acoustic spectral variability, and (b) pronunciation variability and linguistic constraints. Our Analysis spans over (i) number of DNN model parameters (for adaptation), (ii) amount of adaptation data, (iii) ages of children, (iv) age dependent-independent adaptation. Finally, we provide Recommendations on (i) the favorable strategies over various aforementioned - analyzed parameters, and (ii) potential future research directions and relevant challenges/problems persisting in DNN based ASR for children's speech.

READ FULL TEXT

page 5

page 8

page 9

page 10

page 11

page 12

research
02/19/2021

End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study

A key desiderata for inclusive and accessible speech recognition technol...
research
11/09/2020

Data Augmentation For Children's Speech Recognition – The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge

This paper presents the "Ethiopian" system for the SLT 2021 Children Spe...
research
09/09/2022

Longitudinal Acoustic Speech Tracking Following Pediatric Traumatic Brain Injury

Recommendations for common outcome measures following pediatric traumati...
research
06/19/2022

Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping

Automatic Speech Recognition (ASR) systems are known to exhibit difficul...
research
03/04/2021

End-to-end acoustic modelling for phone recognition of young readers

Automatic recognition systems for child speech are lagging behind those ...
research
02/13/2016

Signer-independent Fingerspelling Recognition with Deep Neural Network Adaptation

We study the problem of recognition of fingerspelled letter sequences in...
research
09/06/2019

Neural Network-Based Modeling of Phonetic Durations

A deep neural network (DNN)-based model has been developed to predict no...

Please sign up or login with your details

Forgot password? Click here to reset