Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

07/02/2022
by   Liumeng Xue, et al.
0

Building a voice conversion system for noisy target speakers, such as users providing noisy samples or Internet found data, is a challenging task since the use of contaminated speech in model training will apparently degrade the conversion performance. In this paper, we leverage the advances of our recently proposed Glow-WaveGAN and propose a noise-independent speech representation learning approach for high-quality voice conversion for noisy target speakers. Specifically, we learn a latent feature space where we ensure that the target distribution modeled by the conversion model is exactly from the modeled distribution of the waveform generator. With this premise, we further manage to make the latent feature to be noise-invariant. Specifically, we introduce a noise-controllable WaveGAN, which directly learns the noise-independent acoustic representation from waveform by the encoder and conducts noise control in the hidden space through a FiLM module in the decoder. As for the conversion model, importantly, we use a flow-based model to learn the distribution of noise-independent but speaker-related latent features from phoneme posteriorgrams. Experimental results demonstrate that the proposed model achieves high speech quality and speaker similarity in the voice conversion for noisy target speakers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2022

Noise-robust voice conversion with domain adversarial training

Voice conversion has made great progress in the past few years under the...
research
10/24/2019

Towards Fine-Grained Prosody Control for Voice Conversion

In a typical voice conversion system, prior works utilize various acoust...
research
03/31/2022

WavThruVec: Latent speech representation as intermediate features for neural speech synthesis

Recent advances in neural text-to-speech research have been dominated by...
research
06/27/2022

Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion

In most of practical scenarios, the announcement system must deliver spe...
research
09/15/2023

Controllable Residual Speaker Representation for Voice Conversion

Recently, there have been significant advancements in voice conversion, ...
research
10/10/2021

Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Recently, phonetic posteriorgrams (PPGs) based methods have been quite p...
research
03/28/2019

Adversarial Approximate Inference for Speech to Electroglottograph Conversion

Speech produced by human vocal apparatus conveys substantial non-semanti...

Please sign up or login with your details

Forgot password? Click here to reset