Singing Synthesis: with a little help from my attention

12/12/2019
by   Orazio Angelini, et al.
0

We present a novel system for singing synthesis, based on attention. Starting from a musical score with notes and lyrics, we build a phoneme-level multi stream note embedding. The embedding contains the information encoded in the score regarding pitch, duration and the phonemes to be pronounced on each note. This note representation is used to condition an attention-based sequence-to-sequence architecture, in order to generate mel-spectrograms. Our model demonstrates attention can be successfully applied to the singing synthesis field. The system requires considerably less explicit modelling of voice features such as F0 patterns, vibratos, and note and phoneme durations, than most models in the literature. However, we observe that completely dispensing with any duration modelling introduces occasional instabilities in the generated spectrograms. We train an autoregressive WaveNet to be used as a neural vocoder to synthesise the mel-spectrograms produced by the sequence-to-sequence architecture, using a combination of speech and singing data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2021

Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis

This paper presents a method for controlling the prosody at the phoneme ...
research
12/28/2022

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

This paper proposes a novel sequence-to-sequence (seq2seq) model with a ...
research
06/11/2020

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

This paper presents XiaoiceSing, a high-quality singing voice synthesis ...
research
01/05/2023

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

This paper proposes singing voice synthesis (SVS) based on frame-level s...
research
12/27/2019

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network

This paper presents a method that generates expressive singing voice of ...
research
10/22/2019

Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer

We propose a sequence-to-sequence singing synthesizer, which avoids the ...
research
08/07/2020

Peking Opera Synthesis via Duration Informed Attention Network

Peking Opera has been the most dominant form of Chinese performing art s...

Please sign up or login with your details

Forgot password? Click here to reset