CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

02/28/2023
by   Ji-Hoon Kim, et al.
0

While recent text-to-speech (TTS) systems have made remarkable strides toward human-level quality, the performance of cross-lingual TTS lags behind that of intra-lingual TTS. This gap is mainly rooted from the speaker-language entanglement problem in cross-lingual TTS. In this paper, we propose CrossSpeech which improves the quality of cross-lingual speech by effectively disentangling speaker and language information in the level of acoustic feature space. Specifically, CrossSpeech decomposes the speech generation pipeline into the speaker-independent generator (SIG) and speaker-dependent generator (SDG). The SIG produces the speaker-independent acoustic representation which is not biased to specific speaker distributions. On the other hand, the SDG models speaker-dependent speech variation that characterizes speaker attributes. By handling each information separately, CrossSpeech can obtain disentangled speaker and language representations. From the experiments, we verify that CrossSpeech achieves significant improvements in cross-lingual TTS, especially in terms of speaker similarity to the target speaker.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Although high-fidelity speech can be obtained for intralingual speech sy...
research
04/25/2023

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

In this paper, we describe the systems developed by the SJTU X-LANCE tea...
research
06/27/2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

Cross-lingual timbre and style generalizable text-to-speech (TTS) aims t...
research
03/31/2022

Data-augmented cross-lingual synthesis in a teacher-student framework

Cross-lingual synthesis can be defined as the task of letting a speaker ...
research
10/11/2022

Cross-Lingual Speaker Identification Using Distant Supervision

Speaker identification, determining which character said each utterance ...
research
06/24/2022

SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

In this paper, we present SANE-TTS, a stable and natural end-to-end mult...
research
07/15/2020

Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization

In this paper we describe the top-scoring IDLab submission for the text-...

Please sign up or login with your details

Forgot password? Click here to reset