Preserving background sound in noise-robust voice conversion via multi-task learning

11/06/2022
by   Jixun Yao, et al.
0

Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios. However, prior research about VC, mainly focusing on clean voices, pay rare attention to VC with background sound. The critical problem for preserving background sound in VC is inevitable speech distortion by the neural separation model and the cascade mismatch between the source separation model and the VC model. In this paper, we propose an end-to-end framework via multi-task learning which sequentially cascades a source separation (SS) module, a bottleneck feature extraction module and a VC module. Specifically, the source separation task explicitly considers critical phase information and confines the distortion caused by the imperfect separation process. The source separation task, the typical VC task and the unified task shares a uniform reconstruction loss constrained by joint training to reduce the mismatch between the SS and VC modules. Experimental results demonstrate that our proposed framework significantly outperforms the baseline systems while achieving comparable quality and speaker similarity to the VC models trained with clean data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2022

Spatial Aware Multi-Task Learning Based Speech Separation

During the Covid, online meetings have become an indispensable part of o...
research
11/13/2021

Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

Beyond the conventional voice conversion (VC) where the speaker informat...
research
10/26/2018

Spectrogram-channels u-net: a source separation model viewing each channel as the spectrogram of each source

Nowadays, the task of sound source separation is an interesting task for...
research
11/06/2019

The sound of my voice: speaker representation loss for target voice separation

Research on content and style representations has been widely studied in...
research
01/25/2021

A Two-stage Framework for Compound Figure Separation

Scientific literature contains large volumes of complex, unstructured fi...
research
09/22/2021

Noisy-to-Noisy Voice Conversion Framework with Denoising Model

In a conventional voice conversion (VC) framework, a VC model is often t...
research
08/06/2021

RadioMic: Sound Sensing via mmWave Signals

Voice interfaces has become an integral part of our lives, with the prol...

Please sign up or login with your details

Forgot password? Click here to reset