Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training

01/10/2020
by   Seung Hee Yang, et al.
0

Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were recorded for the purpose of automatic recognition of voice keyboard in a previous study, the generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech. Objective evaluation using automatic speech recognition of the generated utterance on a held-out test set shows that the recognition performance is improved compared with the original dysarthic speech after performing adversarial training, as the absolute WER has been lowered by 33.4 It demonstrates that the proposed GAN-based conversion method is useful for improving dysarthric speech intelligibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2022

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

In this paper, we investigate several existing and a new state-of-the-ar...
research
11/02/2020

CVC: Contrastive Learning for Non-parallel Voice Conversion

Cycle consistent generative adversarial network (CycleGAN) and variation...
research
04/20/2019

Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

Self-imitating feedback is an effective and learner-friendly method for ...
research
06/07/2018

Domain Adversarial Training for Accented Speech Recognition

In this paper, we propose a domain adversarial training (DAT) algorithm ...
research
08/10/2021

StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition

Preserving the linguistic content of input speech is essential during vo...
research
07/25/2020

Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

We introduce a novel method for emotion conversion in speech that does n...
research
07/24/2023

Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training

Developing a practically-robust automatic speech recognition (ASR) is ch...

Please sign up or login with your details

Forgot password? Click here to reset