SUSing: SU-net for Singing Voice Synthesis

05/24/2022
by   Xulong Zhang, et al.
0

Singing voice synthesis is a generative task that involves multi-dimensional control of the singing model, including lyrics, pitch, and duration, and includes the timbre of the singer and singing skills such as vibrato. In this paper, we proposed SU-net for singing voice synthesis named SUSing. Synthesizing singing voice is treated as a translation task between lyrics and music score and spectrum. The lyrics and music score information is encoded into a two-dimensional feature representation through the convolution layer. The two-dimensional feature and its frequency spectrum are mapped to the target spectrum in an autoregressive manner through a SU-net network. Within the SU-net the stripe pooling method is used to replace the alternate global pooling method to learn the vertical frequency relationship in the spectrum and the changes of frequency in the time domain. The experimental results on the public dataset Kiritan show that the proposed method can synthesize more natural singing voices.

READ FULL TEXT

page 3

page 4

research
08/07/2020

Peking Opera Synthesis via Duration Informed Attention Network

Peking Opera has been the most dominant form of Chinese performing art s...
research
03/15/2023

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Singing voice synthesis (SVS), as a specific task for generating the voc...
research
02/20/2021

Singer Identification Using Deep Timbre Feature Learning with KNN-Net

In this paper, we study the issue of automatic singer identification (SI...
research
04/08/2022

Karaoker: Alignment-free singing voice synthesis with speech training data

Existing singing voice synthesis models (SVS) are usually trained on sin...
research
01/20/2020

JVS-MuSiC: Japanese multispeaker singing-voice corpus

Thanks to developments in machine learning techniques, it has become pos...
research
02/16/2022

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

End-to-end singing voice synthesis (SVS) is attractive due to the avoida...
research
06/29/2021

N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement

Recently, end-to-end Korean singing voice systems have been designed to ...

Please sign up or login with your details

Forgot password? Click here to reset