Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

01/05/2023
by   Miku Nishihara, et al.
0

This paper proposes singing voice synthesis (SVS) based on frame-level sequence-to-sequence models considering vocal timing deviation. In SVS, it is essential to synchronize the timing of singing with temporal structures represented by scores, taking into account that there are differences between actual vocal timing and note start timing. In many SVS systems including our previous work, phoneme-level score features are converted into frame-level ones on the basis of phoneme boundaries obtained by external aligners to take into account vocal timing deviations. Therefore, the sound quality is affected by the aligner accuracy in this system. To alleviate this problem, we introduce an attention mechanism with frame-level features. In the proposed system, the attention mechanism absorbs alignment errors in phoneme boundaries. Additionally, we evaluate the system with pseudo-phoneme-boundaries defined by heuristic rules based on musical scores when there is no aligner. The experimental results show the effectiveness of the proposed system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2022

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

This paper proposes a novel sequence-to-sequence (seq2seq) model with a ...
research
08/05/2021

Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System

This paper presents Sinsy, a deep neural network (DNN)-based singing voi...
research
12/12/2019

Singing Synthesis: with a little help from my attention

We present a novel system for singing synthesis, based on attention. Sta...
research
09/01/2023

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

The single-speaker singing voice synthesis (SVS) usually underperforms a...
research
02/16/2022

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

End-to-end singing voice synthesis (SVS) is attractive due to the avoida...
research
06/18/2020

Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

This paper presents a high quality singing synthesizer that is able to m...
research
10/22/2020

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

The neural network (NN) based singing voice synthesis (SVS) systems requ...

Please sign up or login with your details

Forgot password? Click here to reset