CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

08/18/2020
by   Maitreya Patel, et al.
0

Recently, Generative Adversarial Networks (GAN)-based methods have shown remarkable performance for the Voice Conversion and WHiSPer-to-normal SPeeCH (WHSP2SPCH) conversion. One of the key challenges in WHSP2SPCH conversion is the prediction of fundamental frequency (F0). Recently, authors have proposed state-of-the-art method Cycle-Consistent Generative Adversarial Networks (CycleGAN) for WHSP2SPCH conversion. The CycleGAN-based method uses two different models, one for Mel Cepstral Coefficients (MCC) mapping, and another for F0 prediction, where F0 is highly dependent on the pre-trained model of MCC mapping. This leads to additional non-linear noise in predicted F0. To suppress this noise, we propose Cycle-in-Cycle GAN (i.e., CinC-GAN). It is specially designed to increase the effectiveness in F0 prediction without losing the accuracy of MCC mapping. We evaluated the proposed method on a non-parallel setting and analyzed on speaker-specific, and gender-specific tasks. The objective and subjective tests show that CinC-GAN significantly outperforms the CycleGAN. In addition, we analyze the CycleGAN and CinC-GAN for unseen speakers and the results show the clear superiority of CinC-GAN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2018

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

Although voice conversion (VC) algorithms have achieved remarkable succe...
research
05/21/2022

Cycle-GAN for eye-tracking

This manuscript presents a not typical implementation of the cycle gener...
research
03/02/2022

On the application of generative adversarial networks for nonlinear modal analysis

Linear modal analysis is a useful and effective tool for the design and ...
research
11/02/2021

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Whispered speech is a special way of pronunciation without using vocal c...
research
11/07/2019

Change your singer: a transfer learning generative adversarial framework for song to song conversion

Have you ever wondered how a song might sound if performed by a differen...
research
07/25/2020

Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

We introduce a novel method for emotion conversion in speech that does n...
research
11/30/2017

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

We propose a parallel-data-free voice conversion (VC) method that can le...

Please sign up or login with your details

Forgot password? Click here to reset