MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder

01/11/2022
by   Shoutong Wang, et al.
0

Multi-speaker singing voice synthesis is to generate the singing voice sung by different speakers. To generalize to new speakers, previous zero-shot singing adaptation methods obtain the timbre of the target speaker with a fixed-size embedding from single reference audio. However, they face several challenges: 1) the fixed-size speaker embedding is not powerful enough to capture full details of the target timbre; 2) single reference audio does not contain sufficient timbre information of the target speaker; 3) the pitch inconsistency between different speakers also leads to a degradation in the generated voice. In this paper, we propose a new model called MR-SVS to tackle these problems. Specifically, we employ both a multi-reference encoder and a fixed-size encoder to encode the timbre of the target speaker from multiple reference audios. The Multi-reference encoder can capture more details and variations of the target timbre. Besides, we propose a well-designed pitch shift method to address the pitch inconsistency problem. Experiments indicate that our method outperforms the baseline method both in naturalness and similarity.

READ FULL TEXT
research
06/12/2018

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

We describe a neural network-based system for text-to-speech (TTS) synth...
research
11/17/2020

Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher

Singing voice synthesis has been paid rising attention with the rapid de...
research
03/15/2023

Blind Estimation of Audio Processing Graph

Musicians and audio engineers sculpt and transform their sounds by conne...
research
05/26/2020

Adversarial Contrastive Predictive Coding for Unsupervised Learning of Disentangled Representations

In this work we tackle disentanglement of speaker and content related va...
research
09/01/2022

Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild

In this work, we address the problem of generating speech from silent li...
research
08/24/2023

WavMark: Watermarking for Audio Generation

Recent breakthroughs in zero-shot voice synthesis have enabled imitating...
research
10/26/2020

Speaker Anonymization with Distribution-Preserving X-Vector Generation for the VoicePrivacy Challenge 2020

In this paper, we present a Distribution-Preserving Voice Anonymization ...

Please sign up or login with your details

Forgot password? Click here to reset