VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics

10/06/2020
by   Hirokazu Kameoka, et al.
0

In this paper, we propose a non-parallel any-to-many voice conversion (VC) method termed VoiceGrad. Inspired by WaveGrad, a recently introduced novel waveform generation method, VoiceGrad is based upon the concepts of score matching and Langevin dynamics. It uses weighted denoising score matching to train a score approximator, a fully convolutional network with a U-Net structure designed to predict the gradient of the log density of the speech feature sequences of multiple speakers, and performs VC by using annealed Langevin dynamics to iteratively update an input feature sequence towards the nearest stationary point of the target distribution based on the trained score approximator network. Thanks to the nature of this concept, VoiceGrad enables any-to-many VC, a VC scenario in which the speaker of input speech can be arbitrary, and allows for non-parallel training, which requires no parallel utterances or transcriptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

research
06/02/2021

NVC-Net: End-to-End Adversarial Voice Conversion

Voice conversion has gained increasing popularity in many applications o...
research
09/30/2019

Semi-supervised voice conversion with amortized variational inference

In this work we introduce a semi-supervised approach to the voice conver...
research
11/05/2018

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion

This paper proposes a voice conversion method based on fully convolution...
research
10/08/2020

FastVC: Fast Voice Conversion with non-parallel data

This paper introduces FastVC, an end-to-end model for fast Voice Convers...
research
12/22/2017

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Inspired by recent work on neural network image generation which rely on...
research
02/06/2019

Unsupervised Polyglot Text To Speech

We present a TTS neural network that is able to produce speech in multip...
research
08/27/2020

Non-Parallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks

We have previously proposed a method that allows for non-parallel voice ...

Please sign up or login with your details

Forgot password? Click here to reset