Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

09/14/2019
by   Hieu-Thi Luong, et al.
0

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a similar objective, generating speech with a target voice. However, they are usually developed independently under vastly different frameworks. In this paper, we propose a methodology to bootstrap a VC system from a pretrained speaker-adaptive TTS model and unify the techniques as well as the interpretations of these two tasks. Moreover by offloading the heavy data demand to the training stage of the TTS model, our VC system can be built using a small amount of target speaker speech data. It also opens up the possibility of using speech in a foreign unseen language to build the system. Our subjective evaluations show that the proposed framework is able to not only achieve competitive performance in the standard intra-language scenario but also adapt and convert using speech utterances in an unseen language.

READ FULL TEXT

page 2

page 4

page 5

page 6

research
08/07/2020

DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System

Singing voice conversion is converting the timbre in the source singing ...
research
03/31/2022

HiFi-VC: High Quality ASR-Based Voice Conversion

The goal of voice conversion (VC) is to convert input voice to match the...
research
05/22/2020

NAUTILUS: a Versatile Voice Cloning System

We introduce a novel speech synthesis system, called NAUTILUS, that can ...
research
05/30/2023

Voice Conversion With Just Nearest Neighbors

Any-to-any voice conversion aims to transform source speech into a targe...
research
09/30/2019

Semi-supervised voice conversion with amortized variational inference

In this work we introduce a semi-supervised approach to the voice conver...
research
10/14/2021

Toward Degradation-Robust Voice Conversion

Any-to-any voice conversion technologies convert the vocal timbre of an ...
research
09/22/2021

Noisy-to-Noisy Voice Conversion Framework with Denoising Model

In a conventional voice conversion (VC) framework, a VC model is often t...

Please sign up or login with your details

Forgot password? Click here to reset