DeepAI AI Chat
Log In Sign Up

Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion

by   Orhan Öçal, et al.

We present a method for converting the voices between a set of speakers. Our method is based on training multiple autoencoder paths, where there is a single speaker-independent encoder and multiple speaker-dependent decoders. The autoencoders are trained with an addition of an adversarial loss which is provided by an auxiliary classifier in order to guide the output of the encoder to be speaker independent. The training of the model is unsupervised in the sense that it does not require collecting the same utterances from the speakers nor does it require time aligning over phonemes. Due to the use of a single encoder, our method can generalize to converting the voice of out-of-training speakers to speakers in the training dataset. We present subjective tests corroborating the performance of our method.


Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support

We present a Cycle-GAN based many-to-many voice conversion method that c...

Unsupervised Any-to-Many Audiovisual Synthesis via Exemplar Autoencoders

We present an unsupervised approach that enables us to convert the speec...

Many-to-Many Voice Conversion using Conditional Cycle-Consistent Adversarial Networks

Voice conversion (VC) refers to transforming the speaker characteristics...

Voice Separation with an Unknown Number of Multiple Speakers

We present a new method for separating a mixed audio sequence, in which ...

Unsupervised Speaker Diarization that is Agnostic to Language, Overlap-Aware, and Tuning Free

Podcasts are conversational in nature and speaker changes are frequent –...

Trainable Referring Expression Generation using Overspecification Preferences

Referring expression generation (REG) models that use speaker-dependent ...

Estimating Uniqueness of Human Voice UsingI-Vector Representation

We study the individuality of human voice with re-spect to a widely used...