Accurate synthesis of Dysarthric Speech for ASR data augmentation

08/16/2023
by   Mohammad Soleymanpour, et al.
0

Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems can help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech, which is not readily available for dysarthric talkers. This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation. Differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels are important components for dysarthric speech modeling, synthesis, and augmentation. For dysarthric speech synthesis, a modified neural multi-talker TTS is implemented by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. To evaluate the effectiveness for synthesis of training data for ASR, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2 addition of the severity level and pause insertion controls decrease WER by 6.5 the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has significant impact on the dysarthric ASR systems. In addition, we have conducted a subjective evaluation to evaluate the dysarthric-ness and similarity of synthesized speech. Our subjective evaluation shows that the perceived dysartrhic-ness of synthesized speech is similar to that of true dysarthric speech, especially for higher levels of dysarthria

READ FULL TEXT

page 5

page 9

page 10

research
01/27/2022

Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition

Dysarthria is a motor speech disorder often characterized by reduced spe...
research
02/18/2021

Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition

Automatic speech recognition (ASR) systems for young children are needed...
research
11/04/2022

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

Stuttering is a speech disorder where the natural flow of speech is inte...
research
09/16/2021

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Automatic lyrics transcription (ALT), which can be regarded as automatic...
research
08/25/2020

Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts

Robust speech recognition is a key prerequisite for semantic feature ext...
research
06/16/2020

Towards Automated Assessment of Stuttering and Stuttering Therapy

Stuttering is a complex speech disorder that can be identified by repeti...

Please sign up or login with your details

Forgot password? Click here to reset