End-to-end lyrics Recognition with Voice to Singing Style Transfer

02/17/2021
by   Sakya Basak, et al.
23

Automatic transcription of monophonic/polyphonic music is a challenging task due to the lack of availability of large amounts of transcribed data. In this paper, we propose a data augmentation method that converts natural speech to singing voice based on vocoder based speech synthesizer. This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour of the natural speech with that of a singing voice. The V2S model based style transfer can generate good quality singing voice thereby enabling the conversion of large corpora of natural speech to singing voice that is useful in building an E2E lyrics transcription system. In our experiments on monophonic singing voice data, the V2S style transfer provides a significant gain (relative improvements of 21 also discuss additional components like transfer learning and lyrics based language modeling to improve the performance of the lyrics transcription system.

READ FULL TEXT
research
08/21/2023

PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

Voice conversion as the style transfer task applied to speech, refers to...
research
07/25/2022

Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

Sequence-to-Sequence Text-to-Speech architectures that directly generate...
research
08/26/2022

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

In this paper, we propose a model to perform style transfer of speech to...
research
08/15/2022

Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer

In this paper, we propose a differentiable WORLD synthesizer and demonst...
research
08/11/2020

Transfer Learning for Improving Singing-voice Detection in Polyphonic Instrumental Music

Detecting singing-voice in polyphonic instrumental music is critical to ...
research
09/15/2023

Speech-dependent Modeling of Own Voice Transfer Characteristics for In-ear Microphones in Hearables

Many hearables contain an in-ear microphone, which may be used to captur...
research
04/19/2023

Affective social anthropomorphic intelligent system

Human conversational styles are measured by the sense of humor, personal...

Please sign up or login with your details

Forgot password? Click here to reset