Revisiting IPA-based Cross-lingual Text-to-speech

10/14/2021
by   Haitong Zhang, et al.
0

International Phonetic Alphabet (IPA) has been widely used in cross-lingual text-to-speech (TTS) to achieve cross-lingual voice cloning (CL VC). However, IPA itself has been understudied in cross-lingual TTS. In this paper, we report some empirical findings of building a cross-lingual TTS model using IPA as inputs. Experiments show that the way to process the IPA and suprasegmental sequence has a negligible impact on the CL VC performance. Furthermore, we find that using a dataset including one speaker per language to build an IPA-based TTS system would fail CL VC since the language-unique IPA and tone/stress symbols could leak the speaker information. In addition, we experiment with different combinations of speakers in the training dataset to further investigate the effect of the number of speakers on the CL VC performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2021

Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data

Recently, sequence-to-sequence (seq-to-seq) models have been successfull...
research
05/21/2020

Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario

Modeling voices for multiple speakers and multiple languages in one text...
research
10/14/2021

Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

In this paper, we present a FastPitch-based non-autoregressive cross-lin...
research
11/06/2022

An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

With the recent developments in cross-lingual Text-to-Speech (TTS) syste...
research
02/22/2022

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

Recent advances in cross-lingual text-to-speech (TTS) made it possible t...
research
08/17/2021

Combining speakers of multiple languages to improve quality of neural voices

In this work, we explore multiple architectures and training procedures ...
research
09/02/2023

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech – A Study between English and Mandarin

While the performance of cross-lingual TTS based on monolingual corpora ...

Please sign up or login with your details

Forgot password? Click here to reset