Controllable and Interpretable Singing Voice Decomposition via Assem-VC

10/25/2021
by   Kang-wook Kim, et al.
0

We propose a singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC. With decomposed speaker-independent information and the target speaker's embedding, we could synthesize the singing voice of the target speaker. In conclusion, we made a perfectly synced duet with the user's singing voice and the target singer's converted singing voice.

READ FULL TEXT

page 2

page 6

page 7

page 8

page 9

page 10

page 11

page 12

research
08/24/2018

Voice Conversion with Conditional SampleRNN

Here we present a novel approach to conditioning the SampleRNN generativ...
research
07/12/2022

NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing

In this paper, we propose NEC (Neural Enhanced Cancellation), a defense ...
research
09/24/2021

Evaluating X-vector-based Speaker Anonymization under White-box Assessment

In the scenario of the Voice Privacy challenge, anonymization is achieve...
research
08/20/2020

asya: Mindful verbal communication using deep learning

asya is a mobile application that consists of deep learning models which...
research
11/10/2022

Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples

Faced with the threat of identity leakage during voice data publishing, ...
research
05/22/2023

Can we hear physical and social space together through prosody?

When human listeners try to guess the spatial position of a speech sourc...
research
11/15/2022

Rapid Connectionist Speaker Adaptation

We present SVCnet, a system for modelling speaker variability. Encoder N...

Please sign up or login with your details

Forgot password? Click here to reset