A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

by   Wen-Chin Huang, et al.

We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient. In light of this, we suggest a novel, two-stage approach for DVC, which is highly flexible in that no normal speech of the patient is required. First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality. We investigate several design options. Experimental evaluation results demonstrate the potential of our approach to improving the quality of the dysarthric speech while maintaining the speaker identity.



There are no comments yet.


page 3


Towards Identity Preserving Normal to Dysarthric Voice Conversion

We present a voice conversion framework that converts normal speech into...

Optimizing voice conversion network with cycle consistency loss of speaker identity

We propose a novel training scheme to optimize voice conversion network ...

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

Non-parallel many-to-many voice conversion remains an interesting but ch...

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

This paper focuses on using voice conversion (VC) to improve the speech ...

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

Accent conversion (AC) transforms a non-native speaker's accent into a n...

Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data

This paper introduces Taco-VC, a novel architecture for voice conversion...

CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech

Prosody Transfer (PT) is a technique that aims to use the prosody from a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.