Low-Resource Multilingual and Zero-Shot Multispeaker TTS

10/21/2022
by   Florian Lux, et al.
0

While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world's over 6,000 spoken languages. In this work, we bring together the tasks of zero-shot voice cloning and multilingual low-resource TTS. Using the language agnostic meta learning (LAML) procedure and modifications to a TTS encoder, we show that it is possible for a system to learn speaking a new language using just 5 minutes of training data while retaining the ability to infer the voice of even unseen speakers in the newly learned language. We show the success of our proposed approach in terms of intelligibility, naturalness and similarity to target speaker using objective metrics as well as human studies and provide our code and trained models open source.

READ FULL TEXT

page 3

page 4

page 9

research
12/04/2021

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

YourTTS brings the power of a multilingual approach to the task of zero-...
research
03/07/2022

Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features

While neural text-to-speech systems perform remarkably well in high-reso...
research
05/03/2023

Plug-and-Play Multilingual Few-shot Spoken Words Recognition

As technology advances and digital devices become prevalent, seamless hu...
research
08/20/2020

Efficient neural speech synthesis for low-resource languages through multilingual modeling

Recent advances in neural TTS have led to models that can produce high-q...
research
02/26/2020

Towards Zero-shot Learning for Automatic Phonemic Transcription

Automatic phonemic transcription tools are useful for low-resource langu...
research
03/15/2022

Representation Learning for Resource-Constrained Keyphrase Generation

State-of-the-art keyphrase generation methods generally depend on large ...
research
01/24/2023

Low-Resource Compositional Semantic Parsing with Concept Pretraining

Semantic parsing plays a key role in digital voice assistants such as Al...

Please sign up or login with your details

Forgot password? Click here to reset