Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

12/07/2022
by   Ankur Debnath, et al.
0

End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only.

READ FULL TEXT

page 2

page 4

research
05/04/2022

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

This paper describes the ON-TRAC Consortium translation systems develope...
research
10/19/2021

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

End-to-end TTS suffers from high data requirements as it is difficult fo...
research
10/20/2022

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages

Hate speech is a global phenomenon, but most hate speech datasets so far...
research
09/11/2020

RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

Deep learning enables the development of efficient end-to-end speech pro...
research
05/05/2020

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Layer-wise Transfer Learning

Whispering is an important mode of human speech, but no end-to-end recog...
research
10/22/2020

The NTU-AISG Text-to-speech System for Blizzard Challenge 2020

We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizza...
research
12/08/2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recogni...

Please sign up or login with your details

Forgot password? Click here to reset