Low-Latency Real-Time Non-Parallel Voice Conversion based on Cyclic Variational Autoencoder and Multiband WaveRNN with Data-Driven Linear Prediction

05/20/2021
by   Patrick Lumban Tobing, et al.
0

This paper presents a low-latency real-time (LLRT) non-parallel voice conversion (VC) framework based on cyclic variational autoencoder (CycleVAE) and multiband WaveRNN with data-driven linear prediction (MWDLP). CycleVAE is a robust non-parallel multispeaker spectral model, which utilizes a speaker-independent latent space and a speaker-dependent code to generate reconstructed/converted spectral features given the spectral features of an input speaker. On the other hand, MWDLP is an efficient and a high-quality neural vocoder that can handle multispeaker data and generate speech waveform for LLRT applications with CPU. To accommodate LLRT constraint with CPU, we propose a novel CycleVAE framework that utilizes mel-spectrogram as spectral features and is built with a sparse network architecture. Further, to improve the modeling performance, we also propose a novel fine-tuning procedure that refines the frame-rate CycleVAE network by utilizing the waveform loss from the MWDLP network. The experimental results demonstrate that the proposed framework achieves high-performance VC, while allowing for LLRT usage with a single-core of 2.1–2.7 GHz CPU on a real-time factor of 0.87–0.95, including input/output, feature extraction, on a frame shift of 10 ms, a window length of 27.5 ms, and 2 lookup frames.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2019

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

In this paper, we present a novel technique for a non-parallel voice con...
research
10/09/2020

Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN

In this paper, we present a description of the baseline system of Voice ...
research
11/12/2021

AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion

This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic p...
research
06/29/2016

Optimising The Input Window Alignment in CD-DNN Based Phoneme Recognition for Low Latency Processing

We present a systematic analysis on the performance of a phonetic recogn...
research
05/02/2019

Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

In this work, we investigate the effectiveness of two techniques for imp...

Please sign up or login with your details

Forgot password? Click here to reset