Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion

04/08/2022
by   Weida Liang, et al.
0

Recent research showed that an autoencoder trained with speech of a single speaker, called exemplar autoencoder (eAE), can be used for any-to-one voice conversion (VC). Compared to large-scale many-to-many models such as AutoVC, the eAE model is easy and fast in training, and may recover more details of the target speaker. To ensure VC quality, the latent code should represent and only represent content information. However, this is not easy to attain for eAE as it is unaware of any speaker variation in model training. To tackle the problem, we propose a simple yet effective approach based on a cycle consistency loss. Specifically, we train eAEs of multiple speakers with a shared encoder, and meanwhile encourage the speech reconstructed from any speaker-specific decoder to get a consistent latent code as the original speech when cycled back and encoded again. Experiments conducted on the AISHELL-3 corpus showed that this new approach improved the baseline eAE consistently. The source code and examples are available at the project page: http://project.cslt.org/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2020

Optimizing voice conversion network with cycle consistency loss of speaker identity

We propose a novel training scheme to optimize voice conversion network ...
research
04/09/2018

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

Recently, cycle-consistent adversarial network (Cycle-GAN) has been succ...
research
09/15/2019

Voice Conversion Using Cycle-Consistent Variational Autoencoder

One of the most critical obstacles in voice conversion is the requiremen...
research
06/15/2021

Pathological voice adaptation with autoencoder-based voice conversion

In this paper, we propose a new approach to pathological speech synthesi...
research
05/18/2020

Many-to-Many Voice Transformer Network

This paper proposes a voice conversion (VC) method based on a sequence-t...
research
10/08/2020

FastVC: Fast Voice Conversion with non-parallel data

This paper introduces FastVC, an end-to-end model for fast Voice Convers...
research
04/08/2022

Reliable Visualization for Deep Speaker Recognition

In spite of the impressive success of convolutional neural networks (CNN...

Please sign up or login with your details

Forgot password? Click here to reset