MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting

05/19/2023
by   Neil Shah, et al.
0

We present MParrotTTS, a unified multilingual, multi-speaker text-to-speech (TTS) synthesis model that can produce high-quality speech. Benefiting from a modularized training paradigm exploiting self-supervised speech representations, MParrotTTS adapts to a new language with minimal supervised data and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on any bilingual or parallel examples, MParrotTTS can transfer voices across languages while preserving the speaker-specific characteristics, e.g., synthesizing fluent Hindi speech using a French speaker's voice and accent. We present extensive results on six languages in terms of speech naturalness and speaker similarity in parallel and cross-lingual synthesis. The proposed model outperforms the state-of-the-art multilingual TTS models and baselines, using only a small fraction of supervised training data. Speech samples from our model can be found at https://paper2438.github.io/tts/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2020

Efficient neural speech synthesis for low-resource languages through multilingual modeling

Recent advances in neural TTS have led to models that can produce high-q...
research
01/24/2023

Multilingual Multiaccented Multispeaker TTS with RADTTS

We work to create a multilingual speech synthesis system which can gener...
research
07/09/2019

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

We present a multispeaker, multilingual text-to-speech (TTS) synthesis m...
research
08/03/2020

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

We introduce an approach to multilingual speech synthesis which uses the...
research
03/01/2023

ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations

Text-to-speech (TTS) systems are modelled as mel-synthesizers followed b...
research
10/27/2021

Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

We present a neural analysis and synthesis (NANSY) framework that can ma...
research
05/30/2022

StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis

Text-to-Speech (TTS) has recently seen great progress in synthesizing hi...

Please sign up or login with your details

Forgot password? Click here to reset