DeepAI AI Chat
Log In Sign Up

Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

06/18/2020
by   Jie Wu, et al.
Microsoft
0

This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings. Based on the sequence-to-sequence singing model, we design a multi-singer framework to leverage all the existing singing data of different singers. To attenuate the issue of musical score unbalance among singers, we incorporate an adversarial task of singer classification to make encoder output less singer dependent. Furthermore, we apply multiple random window discriminators (MRWDs) on the generated acoustic features to make the network be a GAN. Both objective and subjective evaluations indicate that the proposed synthesizer can generate higher quality singing voice than baseline (4.12 vs 3.53 in MOS). Especially, the articulation of high-pitched vowels is significantly enhanced.

READ FULL TEXT
10/22/2020

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

The neural network (NN) based singing voice synthesis (SVS) systems requ...
10/28/2022

Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction

Target-speaker voice activity detection is currently a promising approac...
09/12/2018

Neural Melody Composition from Lyrics

In this paper, we study a novel task that learns to compose music from n...
07/31/2021

Voice Reconstruction from Silent Speech with a Sequence-to-Sequence Model

Silent Speech Decoding (SSD) based on Surface electromyography (sEMG) ha...
11/20/2018

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

This paper presents methods of making using of text supervision to impro...
06/29/2017

Talking Drums: Generating drum grooves with neural networks

Presented is a method of generating a full drum kit part for a provided ...
01/05/2023

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

This paper proposes singing voice synthesis (SVS) based on frame-level s...