Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

06/27/2022
by   Wei-Ping Huang, et al.
0

This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting. Transfer learning is a common approach when it comes to few-shot learning since training from scratch on few-shot training data is bound to overfit. Still, we find that the naive transfer learning approach fails to adapt to unseen languages under extremely few-shot settings, where less than 8 minutes of data is provided. We deal with the problem by proposing a framework that consists of a phoneme-based TTS model and a codebook module to project phonemes from different languages into a learned latent space. Furthermore, by utilizing phoneme-level averaged self-supervised learned features, we effectively improve the quality of synthesized speeches. Experiments show that using 4 utterances, which is about 30 seconds of data, is enough to synthesize intelligible speech when adapting to an unseen language using our framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2021

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

Recent progress in self-training, self-supervised pretraining and unsupe...
research
10/10/2021

Injecting Text and Cross-lingual Supervision in Few-shot Learning from Self-Supervised Models

Self-supervised model pre-training has recently garnered significant int...
research
07/19/2022

On the cross-lingual transferability of multilingual prototypical models across NLU tasks

Supervised deep learning-based approaches have been applied to task-orie...
research
09/30/2020

Cross-lingual Spoken Language Understanding with Regularized Representation Alignment

Despite the promising results of current cross-lingual models for spoken...
research
11/17/2021

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

The idea of using phonological features instead of phonemes as input to ...
research
04/16/2020

Cross-lingual Contextualized Topic Models with Zero-shot Learning

Many data sets in a domain (reviews, forums, news, etc.) exist in parall...
research
09/14/2021

Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by injecting Character-level Noise

Cross-lingual transfer between a high-resource language and its dialects...

Please sign up or login with your details

Forgot password? Click here to reset