An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion

04/25/2021
by   Sandipan Dhar, et al.
0

Voice Conversion (VC) emerged as a significant domain of research in the field of speech synthesis in recent years due to its emerging application in voice-assisting technology, automated movie dubbing, and speech-to-singing conversion to name a few. VC basically deals with the conversion of vocal style of one speaker to another speaker while keeping the linguistic contents unchanged. VC task is performed through a three-stage pipeline consisting of speech analysis, speech feature mapping, and speech reconstruction. Nowadays the Generative Adversarial Network (GAN) models are widely in use for speech feature mapping from source to target speaker. In this paper, we propose an adaptive learning-based GAN model called ALGAN-VC for an efficient one-to-one VC of speakers. Our ALGAN-VC framework consists of some approaches to improve the speech quality and voice similarity between source and target speakers. The model incorporates a Dense Residual Network (DRN) like architecture to the generator network for efficient speech feature learning, for source to target speech feature conversion. We also integrate an adaptive learning mechanism to compute the loss function for the proposed model. Moreover, we use a boosted learning rate approach to enhance the learning capability of the proposed model. The model is trained by using both forward and inverse mapping simultaneously for a one-to-one VC. The proposed model is tested on Voice Conversion Challenge (VCC) 2016, 2018, and 2020 datasets as well as on our self-prepared speech dataset, which has been recorded in Indian regional languages and in English. A subjective and objective evaluation of the generated speech samples indicated that the proposed model elegantly performed the voice conversion task by achieving high speaker similarity and adequate speech quality.

READ FULL TEXT

page 15

page 16

research
07/21/2021

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

We present an unsupervised non-parallel many-to-many voice conversion (V...
research
05/15/2020

Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

In recent years generative adversarial network (GAN) based models have b...
research
11/02/2021

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Whispered speech is a special way of pronunciation without using vocal c...
research
05/28/2020

Speech-to-Singing Conversion based on Boundary Equilibrium GAN

This paper investigates the use of generative adversarial network (GAN)-...
research
04/04/2017

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

Building a voice conversion (VC) system from non-parallel speech corpora...
research
09/28/2021

Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme

Voice conversion is a common speech synthesis task which can be solved i...
research
09/08/2021

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

Voice conversion (VC) is an effective approach to electrolaryngeal (EL) ...

Please sign up or login with your details

Forgot password? Click here to reset