Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

09/01/2023
by   Shaohuan Zhou, et al.
0

The single-speaker singing voice synthesis (SVS) usually underperforms at pitch values that are out of the singer's vocal range or associated with limited training samples. Based on our previous work, this work proposes a melody-unsupervised multi-speaker pre-training method conducted on a multi-singer dataset to enhance the vocal range of the single-speaker, while not degrading the timbre similarity. This pre-training method can be deployed to a large-scale multi-singer dataset, which only contains audio-and-lyrics pairs without phonemic timing information and pitch annotation. Specifically, in the pre-training step, we design a phoneme predictor to produce the frame-level phoneme probability vectors as the phonemic timing information and a speaker encoder to model the timbre variations of different singers, and directly estimate the frame-level f0 values from the audio to provide the pitch information. These pre-trained model parameters are delivered into the fine-tuning step as prior knowledge to enhance the single speaker's vocal range. Moreover, this work also contributes to improving the sound quality and rhythm naturalness of the synthesized singing voices. It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice, and a bi-directional flow model to improve the sound quality. Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.

READ FULL TEXT

page 2

page 4

research
03/21/2022

WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses

In this paper, we develop a new multi-singer Chinese neural singing voic...
research
11/05/2020

Improving Event Duration Prediction via Time-aware Pre-training

End-to-end models in NLP rarely encode external world knowledge about le...
research
11/17/2021

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

In this paper, a text-to-rapping/singing system is introduced, which can...
research
08/31/2023

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

This paper presents an end-to-end high-quality singing voice synthesis (...
research
01/05/2023

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

This paper proposes singing voice synthesis (SVS) based on frame-level s...
research
11/02/2022

Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

This paper proposes an expressive singing voice synthesis system by intr...
research
10/07/2021

Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

This paper introduces the Multi-Band Excited WaveNet a neural vocoder fo...

Please sign up or login with your details

Forgot password? Click here to reset