Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation

06/21/2023
by   Zhonghua Liu, et al.
0

Voice Conversion (VC) converts the voice of a source speech to that of a target while maintaining the source's content. Speech can be mainly decomposed into four components: content, timbre, rhythm and pitch. Unfortunately, most related works only take into account content and timbre, which results in less natural speech. Some recent works are able to disentangle speech into several components, but they require laborious bottleneck tuning or various hand-crafted features, each assumed to contain disentangled speech information. In this paper, we propose a VC model that can automatically disentangle speech into four components using only two augmentation functions, without the requirement of multiple hand-crafted features or laborious bottleneck tuning. The proposed model is straightforward yet efficient, and the empirical results demonstrate that our model can achieve a better performance than the baseline, regarding disentanglement effectiveness and speech naturalness.

READ FULL TEXT
research
08/21/2023

PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

Voice conversion as the style transfer task applied to speech, refers to...
research
02/21/2022

AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning

Voice Conversion(VC) refers to changing the timbre of a speech while ret...
research
04/23/2020

Unsupervised Speech Decomposition via Triple Information Bottleneck

Speech information can be roughly decomposed into four components: langu...
research
11/03/2021

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

The goal of voice conversion is to transform source speech into a target...
research
10/25/2022

MetaSpeech: Speech Effects Switch Along with Environment for Metaverse

Metaverse expands the physical world to a new dimension, and the physica...
research
08/19/2023

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Singing technique conversion (STC) refers to the task of converting from...
research
10/24/2019

Towards Fine-Grained Prosody Control for Voice Conversion

In a typical voice conversion system, prior works utilize various acoust...

Please sign up or login with your details

Forgot password? Click here to reset