The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

10/06/2020
by   Wen-Chin Huang, et al.
0

This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC), which is to first transcribe the input speech with an automatic speech recognition (ASR) model, followed using the transcriptions to generate the voice of the target with a text-to-speech (TTS) model. We revisit this method under a sequence-to-sequence (seq2seq) framework by utilizing ESPnet, an open-source end-to-end speech processing toolkit, and the many well-configured pretrained models provided by the community. Official evaluation results show that our system comes out top among the participating systems in terms of conversion similarity, demonstrating the promising ability of seq2seq models to convert speaker identity. The implementation is made open-source at: https://github.com/espnet/espnet/tree/master/egs/vcc20.

READ FULL TEXT
research
09/03/2020

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer

With the development of automatic speech recognition (ASR) and text-to-s...
research
10/29/2020

The IQIYI System for Voice Conversion Challenge 2020

This paper presents the IQIYI voice conversion system (T24) for Voice Co...
research
05/29/2023

CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice

Despite the recent advancements in Automatic Speech Recognition (ASR), t...
research
07/15/2019

Hierarchical Sequence to Sequence Voice Conversion with Limited Data

We present a voice conversion solution using recurrent sequence to seque...
research
04/14/2021

Non-autoregressive sequence-to-sequence voice conversion

This paper proposes a novel voice conversion (VC) method based on non-au...
research
04/10/2017

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

Voice conversion (VC) using sequence-to-sequence learning of context pos...
research
12/14/2019

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC...

Please sign up or login with your details

Forgot password? Click here to reset