Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

04/10/2017
by   Hiroyuki Miyoshi, et al.
0

Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior probabilities estimated from the source speech parameters. Although conventional VC can be built from non-parallel data, it is difficult to convert speaker individuality such as phonetic property and speaking rate contained in the posterior probabilities because the source posterior probabilities are directly used for predicting target speech parameters. In this work, we assume that the training data partly include parallel speech data and propose sequence-to-sequence learning between the source and target posterior probabilities. The conversion models perform non-linear and variable-length transformation from the source probability sequence to the target one. Further, we propose a joint training algorithm for the modules. In contrast to conventional VC, which separately trains the speech recognition that estimates posterior probabilities and the speech synthesis that predicts target speech parameters, our proposed method jointly trains these modules along with the proposed probability conversion modules. Experimental results demonstrate that our approach outperforms the conventional VC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2019

Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

We investigated the training of a shared model for both text-to-speech (...
research
10/06/2020

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

This paper presents the sequence-to-sequence (seq2seq) baseline system f...
research
11/05/2018

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion

This paper proposes a voice conversion method based on fully convolution...
research
10/16/2018

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

In this paper, a neural network named Sequence-to- sequence ConvErsion N...
research
10/19/2022

Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater...
research
11/09/2018

AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms

This paper describes a method based on a sequence-to-sequence learning (...
research
09/05/2023

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

Foreign accent conversion (FAC) is a special application of voice conver...

Please sign up or login with your details

Forgot password? Click here to reset