Noisy-to-Noisy Voice Conversion Framework with Denoising Model

09/22/2021
by   Chao Xie, et al.
0

In a conventional voice conversion (VC) framework, a VC model is often trained with a clean dataset consisting of speech data carefully recorded and selected by minimizing background interference. However, collecting such a high-quality dataset is expensive and time-consuming. Leveraging crowd-sourced speech data in training is more economical. Moreover, for some real-world VC scenarios such as VC in video and VC-based data augmentation for speech recognition systems, the background sounds themselves are also informative and need to be maintained. In this paper, to explore VC with the flexibility of handling background sounds, we propose a noisy-to-noisy (N2N) VC framework composed of a denoising module and a VC module. With the proposed framework, we can convert the speaker's identity while preserving the background sounds. Both objective and subjective evaluations are conducted, and the results reveal the effectiveness of the proposed framework.

READ FULL TEXT
research
11/13/2021

Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

Beyond the conventional voice conversion (VC) where the speaker informat...
research
05/18/2023

Data Augmentation for Diverse Voice Conversion in Noisy Environments

Voice conversion (VC) models have demonstrated impressive few-shot conve...
research
10/15/2021

Towards Identity Preserving Normal to Dysarthric Voice Conversion

We present a voice conversion framework that converts normal speech into...
research
06/30/2022

An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions

This paper presents a new voice conversion (VC) framework capable of dea...
research
09/14/2019

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share ...
research
03/15/2021

DHASP: Differentiable Hearing Aid Speech Processing

Hearing aids are expected to improve speech intelligibility for listener...
research
11/06/2022

Preserving background sound in noise-robust voice conversion via multi-task learning

Background sound is an informative form of art that is helpful in provid...

Please sign up or login with your details

Forgot password? Click here to reset