Adversarially Trained End-to-end Korean Singing Voice Synthesis System

08/06/2019
by   Juheon Lee, et al.
0

In this paper, we propose an end-to-end Korean singing voice synthesis system from lyrics and a symbolic melody using the following three novel approaches: 1) phonetic enhancement masking, 2) local conditioning of text and pitch to the super-resolution network, and 3) conditional adversarial training. The proposed system consists of two main modules; a mel-synthesis network that generates a mel-spectrogram from the given input information, and a super-resolution network that upsamples the generated mel-spectrogram into a linear-spectrogram. In the mel-synthesis network, phonetic enhancement masking is applied to generate implicit formant masks solely from the input text, which enables a more accurate phonetic control of singing voice. In addition, we show that two other proposed methods – local conditioning of text and pitch, and conditional adversarial training – are crucial for a realistic generation of the human singing voice in the super-resolution process. Finally, both quantitative and qualitative evaluations are conducted, confirming the validity of all proposed methods.

READ FULL TEXT

page 2

page 4

research
06/29/2021

N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement

Recently, end-to-end Korean singing voice systems have been designed to ...
research
09/16/2019

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

Scene text recognition has witnessed rapid development with the advance ...
research
03/29/2023

Implicit Diffusion Models for Continuous Super-Resolution

Image super-resolution (SR) has attracted increasing attention due to it...
research
10/28/2012

Resolution Enhancement of Range Images via Color-Image Segmentation

We report a method for super-resolution of range images. Our approach le...
research
12/03/2022

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating...
research
11/10/2022

GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning

We propose GANStrument, a generative adversarial model for instrument so...
research
01/31/2020

A Generative Adversarial Network for AI-Aided Chair Design

We present a method for improving human design of chairs. The goal of th...

Please sign up or login with your details

Forgot password? Click here to reset