Learning pronunciation from a foreign language in speech synthesis networks

11/23/2018
by   Younggun Lee, et al.
0

Although there are more than 65,000 languages in the world, the pronunciations of many phonemes sound similar across the languages. When people learn a foreign language, their pronunciation often reflect their native language's characteristics. That motivates us to investigate how the speech synthesis network learns the pronunciation when multi-lingual dataset is given. In this study, we train the speech synthesis network bilingually in English and Korean, and analyze how the network learns the relations of phoneme pronunciation between the languages. Our experimental result shows that the learned phoneme embedding vectors are located closer if their pronunciations are similar across the languages. Based on the result, we also show that it is possible to train networks that synthesize English speaker's Korean speech and vice versa. In another experiment, we train the network with limited amount of English dataset and large Korean dataset, and analyze the required amount of dataset to train a resource-poor language with the help of resource-rich languages.

READ FULL TEXT
research
07/09/2019

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

We present a multispeaker, multilingual text-to-speech (TTS) synthesis m...
research
05/21/2020

Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario

Modeling voices for multiple speakers and multiple languages in one text...
research
10/10/2022

YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding

Visually grounded speech (VGS) models are trained on images paired with ...
research
07/12/2020

NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling

Many commercial and forensic applications of speech demand the extractio...
research
08/18/2016

DNN-based Speech Synthesis for Indian Languages from ASCII text

Text-to-Speech synthesis in Indian languages has a seen lot of progress ...
research
04/03/2018

Contrastive Learning of Emoji-based Representations for Resource-Poor Languages

The introduction of emojis (or emoticons) in social media platforms has ...
research
06/16/2023

CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages

In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lin...

Please sign up or login with your details

Forgot password? Click here to reset