Voice Conversion Using Cycle-Consistent Variational Autoencoder

09/15/2019

∙

One of the most critical obstacles in voice conversion is the requirement of parallel training data, which contain the same linguistic content utterances spoken by different speakers. Collecting such parallel data is highly expensive process, therefore many works attempted to use non-parallel training data for voice conversion. One of such successful approaches is using cycle-consistent adversarial networks (CycleGAN), which utilize the cycle consistency loss. The major drawback of CycleGAN based methods, however, is that they can handle only one-to-one voice conversion from a source speaker to a target speaker, which makes it difficult to use for general-purpose cases requiring many-to-many voice conversion among multiple speakers. Another group of approaches using variational autoencoder (VAE) can handle many-to-many voice conversion, but their sound qualities are much lower than that of CycleGAN based methods. In this paper, we propose to use a cycle consistency loss for VAE to improve the sound quality of the conventional VAE based methods for many-to-many voice conversion.

READ FULL TEXT

Voice Conversion Using Cycle-Consistent Variational Autoencoder

Sign in with Google

Consider DeepAI Pro