Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

11/17/2021
by   Konstantinos Markopoulos, et al.
0

In this paper, a text-to-rapping/singing system is introduced, which can be adapted to any speaker's voice. It utilizes a Tacotron-based multispeaker acoustic model trained on read-only speech data and which provides prosody control at the phoneme level. Dataset augmentation and additional prosody manipulation based on traditional DSP algorithms are also investigated. The neural TTS model is fine-tuned to an unseen speaker's limited recordings, allowing rapping/singing synthesis with the target's speaker voice. The detailed pipeline of the system is described, which includes the extraction of the target pitch and duration values from an a capella song and their conversion into target speaker's valid range of notes before synthesis. An additional stage of prosodic manipulation of the output via WSOLA is also investigated for better matching the target duration values. The synthesized utterances can be mixed with an instrumental accompaniment track to produce a complete song. The proposed system is evaluated via subjective listening tests as well as in comparison to an available alternate system which also aims to produce synthetic singing voice from read-only training data. Results show that the proposed approach can produce high quality rapping/singing voice with increased naturalness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2018

Linear networks based speaker adaptation for speech synthesis

Speaker adaptation methods aim to create fair quality synthesis speech v...
research
11/17/2020

Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher

Singing voice synthesis has been paid rising attention with the rapid de...
research
01/24/2023

Multilingual Multiaccented Multispeaker TTS with RADTTS

We work to create a multilingual speech synthesis system which can gener...
research
09/01/2023

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

The single-speaker singing voice synthesis (SVS) usually underperforms a...
research
09/22/2017

Techniques and Challenges in Speech Synthesis

The aim of this project was to develop and implement an English language...
research
11/19/2021

Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control

This paper presents a method for phoneme-level prosody control of F0 and...
research
03/30/2022

Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher

Building a high-quality singing corpus for a person who is not good at s...

Please sign up or login with your details

Forgot password? Click here to reset