An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech

05/25/2022
by   Wei Liu, et al.
0

The performance of child speech recognition is generally less satisfactory compared to adult speech due to limited amount of training data. Significant performance degradation is expected when applying an automatic speech recognition (ASR) system trained on adult speech to child speech directly, as a result of domain mismatch. The present study is focused on adult-to-child acoustic feature conversion to alleviate this mismatch. Different acoustic feature conversion approaches, including deep neural network based and signal processing based, are investigated and compared under a fair experimental setting, in which converted acoustic features from the same amount of labeled adult speech are used to train the ASR models from scratch. Experimental results reveal that not all of the conversion methods lead to ASR performance gain. Specifically, as a classic unsupervised domain adaptation method, the statistic matching does not show an effectiveness. A disentanglement-based auto-encoder (DAE) conversion framework is found to be useful and the approach of F0 normalization achieves the best performance. It is noted that the F0 distribution of converted features is an important attribute to reflect the conversion quality, while utilizing an adult-child deep classification model to make judgment is shown to be inappropriate.

READ FULL TEXT
research
05/21/2018

Adversarial Learning of Raw Speech Features for Domain Invariant Speech Recognition

Recent advances in neural network based acoustic modelling have shown si...
research
07/10/2019

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

In automatic speech recognition (ASR), wideband (WB) and narrowband (NB)...
research
07/20/2021

On Prosody Modeling for ASR+TTS based Voice Conversion

In voice conversion (VC), an approach showing promising results in the l...
research
11/17/2020

Refining Automatic Speech Recognition System for older adults

Building a high quality automatic speech recognition (ASR) system with l...
research
12/12/2019

On Neural Phone Recognition of Mixed-Source ECoG Signals

The emerging field of neural speech recognition (NSR) using electrocorti...
research
04/15/2021

Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

End-to-end automatic speech recognition (ASR) can achieve promising perf...
research
10/03/2022

Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

We propose a new framework to improve automatic speech recognition (ASR)...

Please sign up or login with your details

Forgot password? Click here to reset