Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

08/29/2018
by   Wen-Chin Huang, et al.
0

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner. A previous study has confirmed the ef- fectiveness of VAE using the STRAIGHT spectra for VC. How- ever, VAE using other types of spectral features such as mel- cepstral coefficients (MCCs), which are related to human per- ception and have been widely used in VC, have not been prop- erly investigated. Instead of using one specific type of spectral feature, it is expected that VAE may benefit from using multi- ple types of spectral features simultaneously, thereby improving the capability of VAE for VC. To this end, we propose a novel VAE framework (called cross-domain VAE, CDVAE) for VC. Specifically, the proposed framework utilizes both STRAIGHT spectra and MCCs by explicitly regularizing multiple objectives in order to constrain the behavior of the learned encoder and de- coder. Experimental results demonstrate that the proposed CD- VAE framework outperforms the conventional VAE framework in terms of subjective tests.

READ FULL TEXT
research
05/02/2019

Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

In this work, we investigate the effectiveness of two techniques for imp...
research
06/11/2023

Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features

Patients who have had their entire larynx removed, including the vocal f...
research
07/24/2019

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

In this paper, we present a novel technique for a non-parallel voice con...
research
06/14/2019

Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders

This research attempts to construct a network that can convert online an...
research
04/07/2017

DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding

Human face exhibits an inherent hierarchy in its representations (i.e., ...
research
07/28/2021

Unsupervised Learning of Neurosymbolic Encoders

We present a framework for the unsupervised learning of neurosymbolic en...

Please sign up or login with your details

Forgot password? Click here to reset