Building a mixed-lingual neural TTS system with only monolingual data

04/12/2019
by   Liumeng Xue, et al.
0

When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded. This paper looks into the problem in the encoder-decoder framework when only monolingual data from a target speaker is available. Specifically, we view the problem from two aspects: speaker consistency within an utterance and naturalness. We start the investigation with an Average Voice Model which is built from multi-speaker monolingual data, i.e. Mandarin and English data. On the basis of that, we look into speaker embedding for speaker consistency within an utterance and phoneme embedding for naturalness and intelligibility and study the choice of data for model training. We report the findings and discuss the challenges to build a mixed-lingual TTS system with only monolingual data.

READ FULL TEXT

page 4

page 5

research
10/16/2020

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

Recent state-of-the-art neural text-to-speech (TTS) synthesis models hav...
research
12/07/2022

Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

In most cases, bilingual TTS needs to handle three types of input script...
research
10/14/2021

Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

In this paper, we present a FastPitch-based non-autoregressive cross-lin...
research
04/05/2022

Improving Voice Trigger Detection with Metric Learning

Voice trigger detection is an important task, which enables activating a...
research
05/04/2021

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

This work examines the content and usefulness of disentangled phone and ...
research
03/11/2018

Generating Bilingual Pragmatic Color References

Contextual influences on language exhibit substantial language-independe...
research
12/01/2020

A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data

This paper proposes a unified deep speaker embedding framework for model...

Please sign up or login with your details

Forgot password? Click here to reset