XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

06/11/2020
by   Peiling Lu, et al.
0

This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling. We follow the main architecture of FastSpeech while proposing some singing-specific design: 1) Besides phoneme ID and position encoding, features from musical score (e.g.note pitch and length) are also added. 2) To attenuate off-key issues, we add a residual connection in F0 prediction. 3) In addition to the duration loss of each phoneme, the duration of all the phonemes in a musical note is accumulated to calculate the syllable duration loss for rhythm enhancement. Experiment results show that XiaoiceSing outperforms the baseline system of convolutional neural networks by 1.44 MOS on sound quality, 1.18 on pronunciation accuracy and 1.38 on naturalness respectively. In two A/B tests, the proposed F0 and duration modeling methods achieve 97.3 preference rate over baseline respectively, which demonstrates the overwhelming advantages of XiaoiceSing.

READ FULL TEXT
research
12/27/2019

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network

This paper presents a method that generates expressive singing voice of ...
research
12/28/2022

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

This paper proposes a novel sequence-to-sequence (seq2seq) model with a ...
research
12/12/2019

Singing Synthesis: with a little help from my attention

We present a novel system for singing synthesis, based on attention. Sta...
research
10/17/2021

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

In this paper, we propose VISinger, a complete end-to-end high-quality s...
research
10/18/2021

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

An automatic pitch correction system typically includes several stages, ...
research
02/16/2022

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

End-to-end singing voice synthesis (SVS) is attractive due to the avoida...
research
08/07/2020

Peking Opera Synthesis via Duration Informed Attention Network

Peking Opera has been the most dominant form of Chinese performing art s...

Please sign up or login with your details

Forgot password? Click here to reset