Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning

08/05/2020
by   Jing-Xuan Zhang, et al.
0

This paper presents an adversarial learning method for recognition-synthesis based non-parallel voice conversion. A recognizer is used to transform acoustic features into linguistic representations while a synthesizer recovers output features from the recognizer outputs together with the speaker identity. By separating the speaker characteristics from the linguistic representations, voice conversion can be achieved by replacing the speaker identity with the target one. In our proposed method, a speaker adversarial loss is adopted in order to obtain speaker-independent linguistic representations using the recognizer. Furthermore, discriminators are introduced and a generative adversarial network (GAN) loss is used to prevent the predicted features from being over-smoothed. For training model parameters, a strategy of pre-training on a multi-speaker dataset and then fine-tuning on the source-target speaker pair is designed. Our method achieved higher similarity than the baseline model that obtained the best performance in Voice Conversion Challenge 2018.

READ FULL TEXT
research
04/09/2018

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

Recently, cycle-consistent adversarial network (Cycle-GAN) has been succ...
research
06/25/2019

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

In this paper, a method for non-parallel sequence-to-sequence (seq2seq) ...
research
10/29/2018

Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics

An audiovisual speaker conversion method is presented for simultaneously...
research
07/26/2021

Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

Voice conversion (VC) consists of digitally altering the voice of an ind...
research
10/31/2022

VoicePrivacy 2022 System Description: Speaker Anonymization with Feature-matched F0 Trajectories

We introduce a novel method to improve the performance of the VoicePriva...
research
02/22/2021

Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion

Generative Adversarial Networks (GANs) are machine learning networks bas...
research
03/26/2019

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

We present a deep neural network based singing voice synthesizer, inspir...

Please sign up or login with your details

Forgot password? Click here to reset