Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition

01/14/2022
by   Mengzhe Geng, et al.
0

Automatic recognition of disordered speech remains a highly challenging task to date. Sources of variability commonly found in normal speech including accent, age or gender, when further compounded with the underlying causes of speech impairment and varying severity levels, create large diversity among speakers. To this end, speaker adaptation techniques play a vital role in current speech recognition systems. Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed to facilitate both accurate speech intelligibility assessment and auxiliary feature based speaker adaptation of state-of-the-art hybrid DNN and end-to-end disordered speech recognition systems. Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i-Vector adaptation by up to 2.63 absolute (8.6 data augmentation. Learning hidden unit contribution (LHUC) based speaker adaptation was further applied. The final speaker adapted system using the proposed spectral basis embedding features gave an overall WER of 25.6 UASpeech test set of 16 dysarthric speakers

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2022

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technol...
research
03/28/2022

On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition

Automatic recognition of dysarthric and elderly speech highly challengin...
research
08/02/2021

Adversarial Data Augmentation for Disordered Speech Recognition

Automatic recognition of disordered speech remains a highly challenging ...
research
01/24/2022

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition

Dysarthric speech recognition is a challenging task due to acoustic vari...
research
05/13/2022

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technol...
research
11/25/2020

SAR-Net: A End-to-End Deep Speech Accent Recognition Network

This paper proposes a end-to-end deep network to recognize kinds of acce...
research
11/03/2022

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

Automatic recognition of disordered speech remains a highly challenging ...

Please sign up or login with your details

Forgot password? Click here to reset