Log In Sign Up

Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners

by   Mutian He, et al.

We present a multilingual end-to-end Text-To-Speech framework that maps byte inputs to spectrograms, thus allowing arbitrary input scripts. Besides strong results on 40+ languages, the framework demonstrates capabilities to adapt to various new languages under extreme low-resource and even few-shot scenarios of merely 40s transcribed recording without the need of lexicon, extra corpus, auxiliary models, or particular linguistic expertise, while retains satisfactory intelligibility and naturalness matching rich-resource models. Exhaustive comparative studies are performed to reveal the potential of the framework for low-resource application and the impact of various factors contributory to adaptation. Furthermore, we propose a novel method to extract language-specific sub-networks for a better understanding of the mechanism of multilingual models.


CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Spoken language translation has recently witnessed a resurgence in popul...

Deep Learning Models for Multilingual Hate Speech Detection

Hate speech detection is a challenging problem with most of the datasets...

Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition

One crucial challenge of real-world multilingual speech recognition is t...

Efficient neural speech synthesis for low-resource languages through multilingual modeling

Recent advances in neural TTS have led to models that can produce high-q...

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

End-to-end TTS suffers from high data requirements as it is difficult fo...

Low-Resource Multilingual and Zero-Shot Multispeaker TTS

While neural methods for text-to-speech (TTS) have shown great advances ...

Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation

Linguistic analysis of language models is one of the ways to explain and...

Code Repositories