The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge

by   Tien-Hong Lo, et al.

This paper describes the NTNU ASR system participating in the Interspeech 2020 Non-Native Children's Speech ASR Challenge supported by the SIG-CHILD group of ISCA. This ASR shared task is made much more challenging due to the coexisting diversity of non-native and children speaking characteristics. In the setting of closed-track evaluation, all participants were restricted to develop their systems merely based on the speech and text corpora provided by the organizer. To work around this under-resourced issue, we built our ASR system on top of CNN-TDNNF-based acoustic models, meanwhile harnessing the synergistic power of various data augmentation strategies, including both utterance- and word-level speed perturbation and spectrogram augmentation, alongside a simple yet effective data-cleansing approach. All variants of our ASR system employed an RNN-based language model to rescore the first-pass recognition hypotheses, which was trained solely on the text dataset released by the organizer. Our system with the best configuration came out in second place, resulting in a word error rate (WER) of 17.59 top-performing, second runner-up and official baseline systems are 15.67 18.71


page 1

page 2

page 3

page 4

page 5


Data augmentation using prosody and false starts to recognize non-native children's speech

This paper describes AaltoASR's speech recognition system for the INTERS...

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children – INTERSPEECH 2021 Shared Task SPAPL System

This paper describes the SPAPL system for the INTERSPEECH 2021 Challenge...

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

Voicebots have provided a new avenue for supporting the development of l...

LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects

This paper proposes a novel linear prediction coding-based data aug-ment...

ELITR Non-Native Speech Translation at IWSLT 2020

This paper is an ELITR system submission for the non-native speech trans...

Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping

Automatic Speech Recognition (ASR) systems are known to exhibit difficul...

Proficiency assessment of L2 spoken English using wav2vec 2.0

The increasing demand for learning English as a second language has led ...

Please sign up or login with your details

Forgot password? Click here to reset