End-to-End Voice Conversion with Information Perturbation

06/15/2022
by   Qicong Xie, et al.
0

The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech. However, current approaches are insufficient to achieve comprehensive source prosody transfer and target speaker timbre preservation in the converted speech, and the quality of the converted speech is also unsatisfied due to the mismatch between the acoustic model and the vocoder. In this paper, we leverage the recent advances in information perturbation and propose a fully end-to-end approach to conduct high-quality voice conversion. We first adopt information perturbation to remove speaker-related information in the source speech to disentangle speaker timbre and linguistic content and thus the linguistic information is subsequently modeled by a content encoder. To better transfer the prosody of the source speech to the target, we particularly introduce a speaker-related pitch encoder which can maintain the general pitch pattern of the source speaker while flexibly modifying the pitch intensity of the generated speech. Finally, one-shot voice conversion is set up through continuous speaker space modeling. Experimental results indicate that the proposed end-to-end approach significantly outperforms the state-of-the-art models in terms of intelligibility, naturalness, and speaker similarity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2020

DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System

Singing voice conversion is converting the timbre in the source singing ...
research
10/27/2022

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Voice conversion (VC) can be achieved by first extracting source content...
research
05/19/2020

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

Accent conversion (AC) transforms a non-native speaker's accent into a n...
research
02/18/2022

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

Though significant progress has been made for speaker-dependent Video-to...
research
02/16/2020

Speech-to-Singing Conversion in an Encoder-Decoder Framework

In this paper our goal is to convert a set of spoken lines into sung one...
research
09/18/2023

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

We propose a novel framework for electrolaryngeal speech intelligibility...
research
10/31/2020

AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

Recently, voice conversion (VC) has been widely studied. Many VC systems...

Please sign up or login with your details

Forgot password? Click here to reset