End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study

by   Prashanth Gurunath Shivakumar, et al.

A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children's speech. Notably, this includes the rapidly advancing neural network based end-to-end speech recognition systems. Children speech recognition is more challenging due to the larger intra-inter speaker variability in terms of acoustic and linguistic characteristics compared to adult speech. Furthermore, the lack of adequate and appropriate children speech resources adds to the challenge of designing robust end-to-end neural architectures. This study provides a critical assessment of automatic children speech recognition through an empirical study of contemporary state-of-the-art end-to-end speech recognition systems. Insights are provided on the aspects of training data requirements, adaptation on children data, and the effect of children age, utterance lengths, different architectures and loss functions for end-to-end systems and role of language models on the speech recognition performance.



There are no comments yet.


page 1

page 2

page 3

page 4


Data Augmentation For Children's Speech Recognition – The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge

This paper presents the "Ethiopian" system for the SLT 2021 Children Spe...

Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations

Children speech recognition is challenging mainly due to the inherent hi...

End-to-end acoustic modelling for phone recognition of young readers

Automatic recognition systems for child speech are lagging behind those ...

The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge

This technical report describes our submission to the 2021 SLT Children ...

Utterance-level neural confidence measure for end-to-end children speech recognition

Confidence measure is a performance index of particular importance for a...

Deep Speech: Scaling up end-to-end speech recognition

We present a state-of-the-art speech recognition system developed using ...

Phone Duration Modeling for Speaker Age Estimation in Children

Automatic inference of important paralinguistic information such as age ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.