Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

12/28/2022
by   Yukiya Hono, et al.
0

This paper proposes a novel sequence-to-sequence (seq2seq) model with a musical note position-aware attention mechanism for singing voice synthesis (SVS). A seq2seq modeling approach that can simultaneously perform acoustic and temporal modeling is attractive. However, due to the difficulty of the temporal modeling of singing voices, many recent SVS systems with an encoder-decoder-based model still rely on explicitly on duration information generated by additional modules. Although some studies perform simultaneous modeling using seq2seq models with an attention mechanism, they have insufficient robustness against temporal modeling. The proposed attention mechanism is designed to estimate the attention weights by considering the rhythm given by the musical score. Furthermore, several techniques are also introduced to improve the modeling performance of the singing voice. Experimental results indicated that the proposed model is effective in terms of both naturalness and robustness of timing.

READ FULL TEXT
research
01/05/2023

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

This paper proposes singing voice synthesis (SVS) based on frame-level s...
research
02/16/2022

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

End-to-end singing voice synthesis (SVS) is attractive due to the avoida...
research
12/12/2019

Singing Synthesis: with a little help from my attention

We present a novel system for singing synthesis, based on attention. Sta...
research
06/11/2020

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

This paper presents XiaoiceSing, a high-quality singing voice synthesis ...
research
04/15/2019

Singing voice synthesis based on convolutional neural networks

The present paper describes a singing voice synthesis based on convoluti...
research
12/27/2019

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network

This paper presents a method that generates expressive singing voice of ...
research
05/12/2021

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

This paper describes an automatic drum transcription (ADT) method that d...

Please sign up or login with your details

Forgot password? Click here to reset