Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data

04/06/2019
by   Roee Levy Leshem, et al.
0

This paper introduces Taco-VC, a novel architecture for voice conversion (VC) based on the Tacotron synthesizer, which is a sequence-to-sequence with attention model. Most current prosody preserving VC systems suffer from target similarity and quality issues in the converted speech. To address these problems, we first recover initial prosody preserving speech using a Phonetic Posteriorgrams (PPGs) based Tacotron synthesizer. Then, we enhance the quality of the converted speech using a novel speech-enhancement network, which is based on a combination of phoneme recognition and Tacotron networks. The final converted speech is generated by a Wavenet vocoder conditioned on Mel Spectrograms. Given the advantages of a single speaker Tacotron and Wavenet, we show how to adapt them to other speakers with limited training data. We evaluate our solution on the VCC 2018 SPOKE task. Using public mid-size datasets, our method outperforms the baseline and achieves competitive results

READ FULL TEXT
research
08/07/2020

DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System

Singing voice conversion is converting the timbre in the source singing ...
research
10/19/2022

Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater...
research
04/12/2022

Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch

The recently developed pitch-controllable text-to-speech (TTS) model, i....
research
10/15/2021

Towards Identity Preserving Normal to Dysarthric Voice Conversion

We present a voice conversion framework that converts normal speech into...
research
10/07/2021

Sequence-To-Sequence Voice Conversion using F0 and Time Conditioning and Adversarial Learning

This paper presents a sequence-to-sequence voice conversion (S2S-VC) alg...
research
04/30/2018

Collapsed speech segment detection and suppression for WaveNet vocoder

In this paper, we propose a technique to alleviate quality degradation c...

Please sign up or login with your details

Forgot password? Click here to reset