Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

11/04/2022
by   Xin Zhang, et al.
0

Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing automatic speech recognition (ASR) interfaces perform poorly on utterances with stutter, mainly due to lack of matched training data. Synthesis of speech with stutter thus presents an opportunity to improve ASR for this type of speech. We describe Stutter-TTS, an end-to-end neural text-to-speech model capable of synthesizing diverse types of stuttering utterances. We develop a simple, yet effective prosody-control strategy whereby additional tokens are introduced into source text during training to represent specific stuttering characteristics. By choosing the position of the stutter tokens, Stutter-TTS allows word-level control of where stuttering occurs in the synthesized utterance. We are able to synthesize stutter events with high accuracy (F1-scores between 0.63 and 0.84, depending on stutter type). By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5.7 relative) degradation for fluent utterances.

READ FULL TEXT
research
01/27/2022

Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition

Dysarthria is a motor speech disorder often characterized by reduced spe...
research
08/16/2023

Accurate synthesis of Dysarthric Speech for ASR data augmentation

Dysarthria is a motor speech disorder often characterized by reduced spe...
research
09/07/2022

Modeling Dependent Structure for Utterances in ASR Evaluation

The bootstrap resampling method has been popular for performing signific...
research
12/11/2020

Improved Robustness to Disfluencies in RNN-Transducer Based Speech Recognition

Automatic Speech Recognition (ASR) based on Recurrent Neural Network Tra...
research
07/10/2020

Class LM and word mapping for contextual biasing in End-to-End ASR

In recent years, all-neural, end-to-end (E2E) ASR systems gained rapid i...
research
06/28/2023

Accelerating Transducers through Adjacent Token Merging

Recent end-to-end automatic speech recognition (ASR) systems often utili...
research
10/23/2020

Enriching Under-Represented Named-Entities To Improve Speech Recognition Performance

Automatic speech recognition (ASR) for under-represented named-entity (U...

Please sign up or login with your details

Forgot password? Click here to reset