N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement

06/29/2021
by   Gyeong-Hoon Lee, et al.
0

Recently, end-to-end Korean singing voice systems have been designed to generate realistic singing voices. However, these systems still suffer from a lack of robustness in terms of pronunciation accuracy. In this paper, we propose N-Singer, a non-autoregressive Korean singing voice system, to synthesize accurate and pronounced Korean singing voices in parallel. N-Singer consists of a Transformer-based mel-generator, a convolutional network-based postnet, and voicing-aware discriminators. It can contribute in the following ways. First, for accurate pronunciation, N-Singer separately models linguistic and pitch information without other acoustic features. Second, to achieve improved mel-spectrograms, N-Singer uses a combination of Transformer-based modules and convolutional network-based modules. Third, in adversarial training, voicing-aware conditional discriminators are used to capture the harmonic features of voiced segments and noise components of unvoiced segments. The experimental results prove that N-Singer can synthesize a natural singing voice in parallel with a more accurate pronunciation than the baseline model.

READ FULL TEXT
research
08/06/2019

Adversarially Trained End-to-end Korean Singing Voice Synthesis System

In this paper, we propose an end-to-end Korean singing voice synthesis s...
research
08/05/2021

Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System

This paper presents Sinsy, a deep neural network (DNN)-based singing voi...
research
06/15/2022

Streaming non-autoregressive model for any-to-many voice conversion

Voice conversion models have developed for decades, and current mainstre...
research
06/15/2021

MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Recent developments in deep learning have significantly improved the qua...
research
06/08/2020

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Advanced text to speech (TTS) models such as FastSpeech can synthesize s...
research
10/06/2022

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

End-to-end models have gradually become the main technical stream for vo...
research
05/24/2022

SUSing: SU-net for Singing Voice Synthesis

Singing voice synthesis is a generative task that involves multi-dimensi...

Please sign up or login with your details

Forgot password? Click here to reset