Noise-robust voice conversion with domain adversarial training

01/26/2022
by   Hongqiang Du, et al.
0

Voice conversion has made great progress in the past few years under the studio-quality test scenario in terms of speech quality and speaker similarity. However, in real applications, test speech from source speaker or target speaker can be corrupted by various environment noises, which seriously degrade the speech quality and speaker similarity. In this paper, we propose a novel encoder-decoder based noise-robust voice conversion framework, which consists of a speaker encoder, a content encoder, a decoder, and two domain adversarial neural networks. Specifically, we integrate disentangling speaker and content representation technique with domain adversarial training technique. Domain adversarial training makes speaker representations and content representations extracted by speaker encoder and content encoder from clean speech and noisy speech in the same space, respectively. In this way, the learned speaker and content representations are noise-invariant. Therefore, the two noise-invariant representations can be taken as input by the decoder to predict the clean converted spectrum. The experimental results demonstrate that our proposed method can synthesize clean converted speech under noisy test scenarios, where the source speech and target speech can be corrupted by seen or unseen noise types during the training process. Additionally, both speech quality and speaker similarity are improved.

READ FULL TEXT

page 4

page 7

page 8

page 9

page 10

page 11

research
09/30/2020

Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data

This paper presents a novel framework to build a voice conversion (VC) s...
research
07/02/2022

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Building a voice conversion system for noisy target speakers, such as us...
research
10/30/2022

Symmetric Saliency-based Adversarial Attack To Speaker Identification

Adversarial attack approaches to speaker identification either need high...
research
08/10/2020

Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training

Data efficient voice cloning aims at synthesizing target speaker's voice...
research
11/23/2018

Training Multi-Task Adversarial Network For Extracting Noise-Robust Speaker Embedding

Under noisy environments, to achieve the robust performance of speaker r...
research
02/16/2020

Speech-to-Singing Conversion in an Encoder-Decoder Framework

In this paper our goal is to convert a set of spoken lines into sung one...
research
07/11/2021

A Deep-Bayesian Framework for Adaptive Speech Duration Modification

We propose the first method to adaptively modify the duration of a given...

Please sign up or login with your details

Forgot password? Click here to reset