Byakto Speech: Real-time long speech synthesis with convolutional neural network: Transfer learning from English to Bangla

05/31/2021
by   Zabir Al Nazi, et al.
1

Speech synthesis is one of the challenging tasks to automate by deep learning, also being a low-resource language there are very few attempts at Bangla speech synthesis. Most of the existing works can't work with anything other than simple Bangla characters script, very short sentences, etc. This work attempts to solve these problems by introducing Byakta, the first-ever open-source deep learning-based bilingual (Bangla and English) text to a speech synthesis system. A speech recognition model-based automated scoring metric was also proposed to evaluate the performance of a TTS model. We also introduce a test benchmark dataset for Bangla speech synthesis models for evaluating speech quality. The TTS is available at https://github.com/zabir-nabil/bangla-tts

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

We propose a decoder-only language model, VoxtLM, that can perform four ...
research
08/20/2020

Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

Despite the growing interest for expressive speech synthesis, synthesis ...
research
10/06/2020

Neural Speech Synthesis for Estonian

This technical report describes the results of a collaboration between t...
research
09/09/2019

Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs

Text-to-speech systems are typically evaluated on single sentences. When...
research
04/20/2021

Review of end-to-end speech synthesis technology based on deep learning

As an indispensable part of modern human-computer interaction system, sp...
research
03/15/2012

Artimate: an articulatory animation framework for audiovisual speech synthesis

We present a modular framework for articulatory animation synthesis usin...
research
03/11/2019

Deep Text-to-Speech System with Seq2Seq Model

Recent trends in neural network based text-to-speech/speech synthesis pi...

Please sign up or login with your details

Forgot password? Click here to reset