Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques

04/02/2021
by   Kang-wook Kim, et al.
0

In this paper, we pose the current state-of-the-art voice conversion (VC) systems as two-encoder-one-decoder models. After comparing these models, we combine the best features and propose Assem-VC, a new state-of-the-art any-to-many non-parallel VC system. This paper also introduces the GTA finetuning in VC, which significantly improves the quality and the speaker similarity of the outputs. Assem-VC outperforms the previous state-of-the-art approaches in both the naturalness and the speaker similarity on the VCTK dataset. As an objective result, the degree of speaker disentanglement of features such as phonetic posteriorgrams (PPG) is also explored. Our investigation indicates that many-to-many VC results are no longer distinct from human speech and similar quality can be achieved with any-to-many models. Audio samples are available at https://mindslab-ai.github.io/assem-vc/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data

This paper presents a novel framework to build a voice conversion (VC) s...
research
05/07/2020

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

We propose Cotatron, a transcription-guided speech encoder for speaker-i...
research
04/12/2018

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

We present the Voice Conversion Challenge 2018, designed as a follow up ...
research
02/11/2019

A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data

In a typical voice conversion system, vocoder is commonly used for speec...
research
05/30/2023

Voice Conversion With Just Nearest Neighbors

Any-to-any voice conversion aims to transform source speech into a targe...
research
03/16/2023

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

Voice Conversion (VC) must be achieved while maintaining the content of ...
research
06/25/2019

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

In this paper, a method for non-parallel sequence-to-sequence (seq2seq) ...

Please sign up or login with your details

Forgot password? Click here to reset