Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

10/14/2021
by   Haoyue Zhan, et al.
0

In this paper, we present a FastPitch-based non-autoregressive cross-lingual Text-to-Speech (TTS) model built with language independent input representation and monolingual force aligners. We propose a phoneme length regulator that solves the length mismatch problem between language-independent phonemes and monolingual alignment results. Our experiments show that (1) an increasing number of training speakers encourages non-autoregressive cross-lingual TTS model to disentangle speaker and language representations, and (2) variance adaptors of FastPitch model can help disentangle speaker identity from learned representations in cross-lingual TTS. The subjective evaluation shows that our proposed model is able to achieve decent speaker consistency and similarity. We further improve the naturalness of Mandarin-dominated mixed-lingual utterances by utilizing the controllability of our proposed model.

READ FULL TEXT
research
10/14/2021

Revisiting IPA-based Cross-lingual Text-to-speech

International Phonetic Alphabet (IPA) has been widely used in cross-ling...
research
05/12/2020

AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN

This paper investigates how to leverage a DurIAN-based average model to ...
research
05/06/2020

A Multi-Perspective Architecture for Semantic Code Search

The ability to match pieces of code to their corresponding natural langu...
research
04/12/2019

Building a mixed-lingual neural TTS system with only monolingual data

When deploying a Chinese neural text-to-speech (TTS) synthesis system, o...
research
02/22/2022

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

Recent advances in cross-lingual text-to-speech (TTS) made it possible t...
research
06/22/2017

Cross-lingual Speaker Verification with Deep Feature Learning

Existing speaker verification (SV) systems often suffer from performance...
research
05/25/2022

Language Anisotropic Cross-Lingual Model Editing

Pre-trained language models learn large amounts of knowledge from their ...

Please sign up or login with your details

Forgot password? Click here to reset