Non-autoregressive sequence-to-sequence voice conversion

04/14/2021
by   Tomoki Hayashi, et al.
0

This paper proposes a novel voice conversion (VC) method based on non-autoregressive sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S models such as FastSpeech in text-to-speech (TTS), we extend the FastSpeech2 model for the VC problem. We introduce the convolution-augmented Transformer (Conformer) instead of the Transformer, making it possible to capture both local and global context information from the input sequence. Furthermore, we extend variance predictors to variance converters to explicitly convert the source speaker's prosody components such as pitch and energy into the target speaker. The experimental evaluation with the Japanese speaker dataset, which consists of male and female speakers of 1,000 utterances, demonstrates that the proposed model enables us to perform more stable, faster, and better conversion than autoregressive S2S (AR-S2S) models such as Tacotron2 and Transformer.

READ FULL TEXT

page 3

page 4

research
10/06/2020

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

This paper presents the sequence-to-sequence (seq2seq) baseline system f...
research
05/18/2020

Many-to-Many Voice Transformer Network

This paper proposes a voice conversion (VC) method based on a sequence-t...
research
09/14/2023

AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion

Non-autoregressive (non-AR) sequence-to-seqeunce (seq2seq) models for vo...
research
04/14/2021

FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion

This paper proposes a non-autoregressive extension of our previously pro...
research
12/14/2019

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC...
research
09/05/2023

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

Foreign accent conversion (FAC) is a special application of voice conver...

Please sign up or login with your details

Forgot password? Click here to reset