DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

06/25/2023
by   Sen Liu, et al.
0

Although high-fidelity speech can be obtained for intralingual speech synthesis, cross-lingual text-to-speech (CTTS) is still far from satisfactory as it is difficult to accurately retain the speaker timbres(i.e. speaker similarity) and eliminate the accents from their first language(i.e. nativeness). In this paper, we demonstrated that vector-quantized(VQ) acoustic feature contains less speaker information than mel-spectrogram. Based on this finding, we propose a novel dual speaker embedding TTS (DSE-TTS) framework for CTTS with authentic speaking style. Here, one embedding is fed to the acoustic model to learn the linguistic speaking style, while the other one is integrated into the vocoder to mimic the target speaker's timbre. Experiments show that by combining both embeddings, DSE-TTS significantly outperforms the state-of-the-art SANE-TTS in cross-lingual synthesis, especially in terms of nativeness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2023

CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

While recent text-to-speech (TTS) systems have made remarkable strides t...
research
06/27/2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

Cross-lingual timbre and style generalizable text-to-speech (TTS) aims t...
research
06/24/2022

SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

In this paper, we present SANE-TTS, a stable and natural end-to-end mult...
research
03/31/2022

Data-augmented cross-lingual synthesis in a teacher-student framework

Cross-lingual synthesis can be defined as the task of letting a speaker ...
research
06/21/2021

UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control

We propose a novel high-fidelity expressive speech synthesis model, UniT...
research
10/18/2021

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

This paper contains a post-challenge performance analysis on cross-lingu...
research
10/11/2022

Cross-Lingual Speaker Identification Using Distant Supervision

Speaker identification, determining which character said each utterance ...

Please sign up or login with your details

Forgot password? Click here to reset