Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

11/13/2021
by   Chao Xie, et al.
0

Beyond the conventional voice conversion (VC) where the speaker information is converted without altering the linguistic content, the background sounds are informative and need to be retained in some real-world scenarios, such as VC in movie/video and VC in music where the voice is entangled with background sounds. As a new VC framework, we have developed a noisy-to-noisy (N2N) VC framework to convert the speaker's identity while preserving the background sounds. Although our framework consisting of a denoising module and a VC module well handles the background sounds, the VC module is sensitive to the distortion caused by the denoising module. To address this distortion issue, in this paper we propose the improved VC module to directly model the noisy speech waveform while controlling the background sounds. The experimental results have demonstrated that our improved framework significantly outperforms the previous one and achieves an acceptable score in terms of naturalness, while reaching comparable similarity performance to the upper bound of our framework.

READ FULL TEXT
research
09/22/2021

Noisy-to-Noisy Voice Conversion Framework with Denoising Model

In a conventional voice conversion (VC) framework, a VC model is often t...
research
06/30/2022

An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions

This paper presents a new voice conversion (VC) framework capable of dea...
research
11/06/2022

Preserving background sound in noise-robust voice conversion via multi-task learning

Background sound is an informative form of art that is helpful in provid...
research
05/10/2021

MASS: Multi-task Anthropomorphic Speech Synthesis Framework

Text-to-Speech (TTS) synthesis plays an important role in human-computer...
research
12/17/2020

DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling

While neural-based text to speech (TTS) models can synthesize natural an...
research
06/16/2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

Voice Conversion (VC) is a technique that aims to transform the non-ling...
research
06/23/2022

Speaker-Independent Microphone Identification in Noisy Conditions

This work proposes a method for source device identification from speech...

Please sign up or login with your details

Forgot password? Click here to reset