Towards Building Text-To-Speech Systems for the Next Billion Users

11/17/2022
by   Gokul Karthik Kumar, et al.
0

Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively lower resource availability, and the diverse set of advances in neural TTS that remain untested. In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. Based on this, we identify monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers to perform the best. With this setup, we train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores. We open-source all models on the Bhashini platform.

READ FULL TEXT
research
05/21/2020

Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario

Modeling voices for multiple speakers and multiple languages in one text...
research
02/06/2019

Unsupervised Polyglot Text To Speech

We present a TTS neural network that is able to produce speech in multip...
research
09/02/2020

Efficient neural speech synthesis for low-resource languages throughmultilingual modeling

Recent advances in neural TTS have led to models that canprodu...
research
08/20/2020

Efficient neural speech synthesis for low-resource languages through multilingual modeling

Recent advances in neural TTS have led to models that can produce high-q...
research
07/04/2022

Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)

Training multilingual Neural Text-To-Speech (NTTS) models using only mon...
research
04/05/2021

SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System

In this paper, we describe SpeakerStew - a hybrid system to perform spea...
research
05/17/2023

Empirical Analysis of Oral and Nasal Vowels of Konkani

Konkani is a highly nasalised language which makes it unique among Indo-...

Please sign up or login with your details

Forgot password? Click here to reset