Semi-supervised voice conversion with amortized variational inference

09/30/2019
by   Cory Stephenson, et al.
0

In this work we introduce a semi-supervised approach to the voice conversion problem, in which speech from a source speaker is converted into speech of a target speaker. The proposed method makes use of both parallel and non-parallel utterances from the source and target simultaneously during training. This approach can be used to extend existing parallel data voice conversion systems such that they can be trained with semi-supervision. We show that incorporating semi-supervision improves the voice conversion performance compared to fully supervised training when the number of parallel utterances is limited as in many practical applications. Additionally, we find that increasing the number non-parallel utterances used in training continues to improve performance when the amount of parallel training data is held constant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2019

A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data

In a typical voice conversion system, vocoder is commonly used for speec...
research
08/15/2018

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion

We study the problem of cross-lingual voice conversion in non-parallel s...
research
10/06/2020

VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics

In this paper, we propose a non-parallel any-to-many voice conversion (V...
research
09/14/2023

Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

Speech anonymisation prevents misuse of spoken data by removing any pers...
research
09/14/2019

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share ...
research
11/20/2018

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

This paper presents methods of making using of text supervision to impro...
research
07/29/2019

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion

Non-parallel multi-domain voice conversion (VC) is a technique for learn...

Please sign up or login with your details

Forgot password? Click here to reset