IMaSC – ICFOSS Malayalam Speech Corpus

11/23/2022
by   Deepa P Gopinath, et al.
0

Modern text-to-speech (TTS) systems use deep learning to synthesize speech increasingly approaching human quality, but they require a database of high quality audio-text sentence pairs for training. Malayalam, the official language of the Indian state of Kerala and spoken by 35+ million people, is a low resource language in terms of available corpora for TTS systems. In this paper, we present IMaSC, a Malayalam text and speech corpora containing approximately 50 hours of recorded speech. With 8 speakers and a total of 34,473 text-audio pairs, IMaSC is larger than every other publicly available alternative. We evaluated the database by using it to train TTS models for each speaker based on a modern deep learning architecture. Via subjective evaluation, we show that our models perform significantly better in terms of naturalness compared to previous studies and publicly available models, with an average mean opinion score of 4.50, indicating that the synthesized speech is close to human quality.

READ FULL TEXT

page 7

page 10

page 11

page 12

page 14

research
06/26/2019

RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

We present RUSLAN -- a new open Russian spoken language corpus for the t...
research
04/17/2021

KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

This paper introduces a high-quality open-source speech synthesis datase...
research
06/17/2019

Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

Modern text-to-speech (TTS) systems are able to generate audio that soun...
research
05/09/2023

Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks

Gaps, dropouts and short clips of corrupted audio are a common problem a...
research
04/06/2022

Prosodic Alignment for off-screen automatic dubbing

The goal of automatic dubbing is to perform speech-to-speech translation...
research
09/22/2022

MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

This paper introduces a high-quality open-source text-to-speech (TTS) sy...
research
08/26/2022

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages

End-to-end (E2E) models have become the default choice for state-of-the-...

Please sign up or login with your details

Forgot password? Click here to reset