Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer

10/22/2019
by   Merlijn Blaauw, et al.
0

We propose a sequence-to-sequence singing synthesizer, which avoids the need for training data with pre-aligned phonetic and acoustic features. Rather than the more common approach of a content-based attention mechanism combined with an autoregressive decoder, we use a different mechanism suitable for feed-forward synthesis. Given that phonetic timings in singing are highly constrained by the musical score, we derive an approximate initial alignment with the help of a simple duration model. Then, using a decoder based on a feed-forward variant of the Transformer model, a series of self-attention and convolutional layers refines the result of the initial alignment to reach the target acoustic features. Advantages of this approach include faster inference and avoiding the exposure bias issues that affect autoregressive models trained by teacher forcing. We evaluate the effectiveness of this model compared to an autoregressive baseline, the importance of self-attention, and the importance of the accuracy of the duration model.

READ FULL TEXT
research
05/15/2020

JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

We propose Jointly trained Duration Informed Transformer (JDI-T), a feed...
research
02/12/2020

GLU Variants Improve Transformer

Gated Linear Units (arXiv:1612.08083) consist of the component-wise prod...
research
07/09/2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

In this paper, we develop DeepSinger, a multi-lingual multi-singer singi...
research
03/04/2020

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

Targeting at both high efficiency and performance, we propose AlignTTS t...
research
12/05/2018

Summarizing Videos with Attention

In this work we propose a novel method for supervised, keyshots based vi...
research
07/18/2018

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

This paper proposes a forward attention method for the sequenceto- seque...
research
12/12/2019

Singing Synthesis: with a little help from my attention

We present a novel system for singing synthesis, based on attention. Sta...

Please sign up or login with your details

Forgot password? Click here to reset