The Academia Sinica Systems of Voice Conversion for VCC2020

by   Yu-Huai Peng, et al.

This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2). For both tasks, we followed the cascaded ASR+TTS structure, using phonetic tokens as the TTS input instead of the text or characters. For Task 1, we used the international phonetic alphabet (IPA) as the input of the TTS model. For Task 2, we used unsupervised phonetic symbols extracted by the vector-quantized variational autoencoder (VQVAE). In the evaluation, the listening test showed that our systems performed well in the VCC2020 challenge.


page 1

page 2

page 3

page 4


Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

The voice conversion challenge is a bi-annual scientific event held to c...

On the Ability of a CNN to Realize Image-to-Image Language Conversion

The purpose of this paper is to reveal the ability that Convolutional Ne...

Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion

Cross-lingual voice conversion (VC) is a task that aims to synthesize ta...

The Singing Voice Conversion Challenge 2023

We present the latest iteration of the voice conversion challenge (VCC) ...

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

We present a large-scale comparative study of self-supervised speech rep...

Please sign up or login with your details

Forgot password? Click here to reset