Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

10/13/2016
by   Chin-Cheng Hsu, et al.
0

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

Singing voice conversion aims to convert singer's voice from source to t...
research
10/13/2016

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

In this paper, we propose a dictionary update method for Nonnegative Mat...
research
11/03/2020

VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

Emotional voice conversion (EVC) aims to convert the emotion of speech f...
research
06/25/2019

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

In this paper, a method for non-parallel sequence-to-sequence (seq2seq) ...
research
09/06/2020

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

This paper proposes an any-to-many location-relative, sequence-to-sequen...
research
08/09/2018

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Speaking rate refers to the average number of phonemes within some unit ...
research
10/10/2022

Interpretable AI for relating brain structural and functional connectomes

One of the central problems in neuroscience is understanding how brain s...

Please sign up or login with your details

Forgot password? Click here to reset