ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

08/13/2018
by   Hirokazu Kameoka, et al.
0

This paper proposes a non-parallel many-to-many voice conversion (VC) method using a variant of the conditional variational autoencoder (VAE) called an auxiliary classifier VAE (ACVAE). The proposed method has three key features. First, it adopts fully convolutional architectures to devise the encoder and decoder networks so that the networks can learn conversion rules that capture time dependencies in the acoustic feature sequences of source and target speech. Second, it uses an information-theoretic regularization for the model training to ensure that the information in the attribute class label will not be lost in the conversion process. With regular CVAEs, the encoder and decoder are free to ignore the attribute class label input. This can be problematic since in such a situation, the attribute class label will have little effect on controlling the voice characteristics of input speech at test time. Such situations can be avoided by introducing an auxiliary classifier and training the encoder and decoder so that the attribute classes of the decoder outputs are correctly predicted by the classifier. Third, it avoids producing buzzy-sounding speech at test time by simply transplanting the spectral details of input speech into its converted version. Subjective evaluation experiments revealed that this simple method worked reasonably well on a non-parallel many-to-many speaker identity conversion task.

READ FULL TEXT
research
09/30/2020

Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data

This paper presents a novel framework to build a voice conversion (VC) s...
research
08/07/2020

A New Approach to Accent Recognition and Conversion for Mandarin Chinese

Two new approaches to accent classification and conversion are presented...
research
05/02/2019

Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

In this work, we investigate the effectiveness of two techniques for imp...
research
11/05/2018

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion

This paper proposes a voice conversion method based on fully convolution...
research
10/13/2016

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

In this paper, we propose a dictionary update method for Nonnegative Mat...
research
10/20/2022

DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion

Voice conversion is a task to convert a non-linguistic feature of a give...
research
04/14/2021

FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion

This paper proposes a non-autoregressive extension of our previously pro...

Please sign up or login with your details

Forgot password? Click here to reset