Data augmentation using prosody and false starts to recognize non-native children's speech

08/29/2020
by   Hemant Kathania, et al.
0

This paper describes AaltoASR's speech recognition system for the INTERSPEECH 2020 shared task on Automatic Speech Recognition (ASR) for non-native children's speech. The task is to recognize non-native speech from children of various age groups given a limited amount of speech. Moreover, the speech being spontaneous has false starts transcribed as partial words, which in the test transcriptions leads to unseen partial words. To cope with these two challenges, we investigate a data augmentation-based approach. Firstly, we apply the prosody-based data augmentation to supplement the audio data. Secondly, we simulate false starts by introducing partial-word noise in the language modeling corpora creating new words. Acoustic models trained on prosody-based augmented data outperform the models using the baseline recipe or the SpecAugment-based augmentation. The partial-word noise also helps to improve the baseline language model. Our ASR system, a combination of these schemes, is placed third in the evaluation period and achieves the word error rate of 18.71 of prosody-based augmented data leads to better performance. Furthermore, removing low-confidence-score words from hypotheses can lead to further gains. These two improvements lower the ASR error rate to 17.99

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2020

The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge

This paper describes the NTNU ASR system participating in the Interspeec...
research
06/18/2021

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children – INTERSPEECH 2021 Shared Task SPAPL System

This paper describes the SPAPL system for the INTERSPEECH 2021 Challenge...
research
02/18/2021

Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition

Automatic speech recognition (ASR) systems for young children are needed...
research
02/27/2023

A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit

Data augmentations are known to improve robustness in speech-processing ...
research
02/24/2022

Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition

Children's automatic speech recognition (ASR) is always difficult due to...
research
11/11/2020

Text Augmentation for Language Models in High Error Recognition Scenario

We examine the effect of data augmentation for training of language mode...
research
11/08/2016

Automatic recognition of child speech for robotic applications in noisy environments

Automatic speech recognition (ASR) allows a natural and intuitive interf...

Please sign up or login with your details

Forgot password? Click here to reset