Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

08/26/2022
by   Shrutina Agarwal, et al.
6

In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a data-driven approach for the problem of converting natural speech to singing voice. We develop a novel neural network architecture, called SymNet, which models the alignment of the input speech with the target melody while preserving the speaker identity and naturalness. The proposed SymNet model is comprised of symmetrical stack of three types of layers - convolutional, transformer, and self-attention layers. The paper also explores novel data augmentation and generative loss annealing methods to facilitate the model training. Experiments are performed on the NUS and NHSS datasets which consist of parallel data of speech and singing voice. In these experiments, we show that the proposed SymNet model improves the objective reconstruction quality significantly over the previously published methods and baseline architectures. Further, a subjective listening test confirms the improved quality of the audio obtained using the proposed approach (absolute improvement of 0.37 in mean opinion score measure over the baseline system).

READ FULL TEXT
research
02/17/2021

End-to-end lyrics Recognition with Voice to Singing Style Transfer

Automatic transcription of monophonic/polyphonic music is a challenging ...
research
02/19/2018

Voice Impersonation using Generative Adversarial Networks

Voice impersonation is not the same as voice transformation, although th...
research
07/25/2022

Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

Sequence-to-Sequence Text-to-Speech architectures that directly generate...
research
10/05/2021

Voice Aging with Audio-Visual Style Transfer

Face aging techniques have used generative adversarial networks (GANs) a...
research
10/08/2019

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Traditional voice conversion methods rely on parallel recordings of mult...
research
07/31/2021

Voice Reconstruction from Silent Speech with a Sequence-to-Sequence Model

Silent Speech Decoding (SSD) based on Surface electromyography (sEMG) ha...
research
12/02/2019

Automated speech-based screening of depression using deep convolutional neural networks

Early detection and treatment of depression is essential in promoting re...

Please sign up or login with your details

Forgot password? Click here to reset