A Melody-Unsupervision Model for Singing Voice Synthesis

10/13/2021
by   Soonbeom Choi, et al.
9

Recent studies in singing voice synthesis have achieved high-quality results leveraging advances in text-to-speech models based on deep neural networks. One of the main issues in training singing voice synthesis models is that they require melody and lyric labels to be temporally aligned with audio data. The temporal alignment is a time-exhausting manual work in preparing for the training data. To address the issue, we propose a melody-unsupervision model that requires only audio-and-lyrics pairs without temporal alignment in training time but generates singing voice audio given a melody and lyrics input in inference time. The proposed model is composed of a phoneme classifier and a singing voice generator jointly trained in an end-to-end manner. The model can be fine-tuned by adjusting the amount of supervision with temporally aligned melody labels. Through experiments in melody-unsupervision and semi-supervision settings, we compare the audio quality of synthesized singing voice. We also show that the proposed model is capable of being trained with speech audio and text labels but can generate singing voice in inference time.

READ FULL TEXT
research
11/17/2022

NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

Various applications of voice synthesis have been developed independentl...
research
02/25/2017

Deep Voice: Real-time Neural Text-to-Speech

We present Deep Voice, a production-quality text-to-speech system constr...
research
03/11/2019

Deep Text-to-Speech System with Seq2Seq Model

Recent trends in neural network based text-to-speech/speech synthesis pi...
research
12/30/2020

Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis

Articulatory-to-acoustic (A2A) synthesis refers to the generation of aud...
research
07/20/2017

VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop

We present a new neural text to speech (TTS) method that is able to tran...
research
06/15/2021

MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Recent developments in deep learning have significantly improved the qua...

Please sign up or login with your details

Forgot password? Click here to reset