Towards Fine-Grained Prosody Control for Voice Conversion

10/24/2019
by   Zheng Lian, et al.
0

In a typical voice conversion system, prior works utilize various acoustic features (e.g., the pitch, voiced/unvoiced flag, aperiodicity) of the source speech to control the prosody of generated waveform. However, the prosody is related with many factors, such as the intonation, stress and rhythm. It is a challenging task to perfectly describe the prosody through acoustic features. To deal with this problem, we propose prosody embeddings to model prosody. These embeddings are learned from the source speech in an unsupervised manner. We conduct experiments on our Mandarin corpus recoded by professional speakers. Experimental results demonstrate that the proposed method enables fine-grained control of the prosody. In challenging situations (such as the source speech is a singing song), our proposed method can also achieve promising results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2022

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Building a voice conversion system for noisy target speakers, such as us...
research
09/25/2018

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

We propose a learning-based filter that allows us to directly modify a s...
research
10/16/2018

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

In this paper, a neural network named Sequence-to- sequence ConvErsion N...
research
02/11/2019

A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data

In a typical voice conversion system, vocoder is commonly used for speec...
research
06/27/2022

Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion

In most of practical scenarios, the announcement system must deliver spe...
research
06/21/2023

Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation

Voice Conversion (VC) converts the voice of a source speech to that of a...
research
07/27/2020

Evaluating the reliability of acoustic speech embeddings

Speech embeddings are fixed-size acoustic representations of variable-le...

Please sign up or login with your details

Forgot password? Click here to reset