On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition

03/28/2022
by   Mengzhe Geng, et al.
0

Automatic recognition of dysarthric and elderly speech highly challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender commonly found in normal speech, when aggregated with age and speech impairment severity, create large diversity among speakers. Speaker adaptation techniques play a crucial role in personalization of ASR systems for such users. Their mobility issues limit the amount of speaker-level data available for model based adaptation. To this end, this paper investigates two novel forms of feature based on-the-fly rapid speaker adaptation approaches. The first is based on speaker-level variance regularized spectral basis embedding (SBEVR) features, while the other uses on-the-fly learning hidden unit contributions (LHUC) transforms conditioned on speaker-level spectral features. Experiments conducted on the UASpeech dysarthric and DimentiaBank Pitt elderly speech datasets suggest the proposed SBEVR features based adaptation statistically significantly outperform both the baseline on-the-fly i-Vector adapted hybrid TDNN/DNN systems by up to 2.48 error rate (WER), and offline batch mode model based LHUC adaptation using all speaker-level data by 0.78

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2022

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technol...
research
05/18/2023

Use of Speech Impairment Severity for Dysarthric Speech Recognition

A key challenge in dysarthric speech recognition is the speaker-level di...
research
01/14/2022

Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition

Automatic recognition of disordered speech remains a highly challenging ...
research
12/14/2020

Bayesian Learning for Deep Neural Network Adaptation

A key task for speech recognition systems is to reduce the mismatch betw...
research
06/26/2023

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

Rich sources of variability in natural speech present significant challe...
research
06/23/2022

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Fundamental modelling differences between hybrid and end-to-end (E2E) au...
research
11/02/2020

Speaker anonymisation using the McAdams coefficient

Anonymisation has the goal of manipulating speech signals in order to de...

Please sign up or login with your details

Forgot password? Click here to reset