A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture

07/22/2020
by   Fady Fahmy, et al.
0

Speech synthesis is the artificial production of human speech. A typical text-to-speech system converts a language text into a waveform. There exist many English TTS systems that produce mature, natural, and human-like speech synthesizers. In contrast, other languages, including Arabic, have not been considered until recently. Existing Arabic speech synthesis solutions are slow, of low quality, and the naturalness of synthesized speech is inferior to the English synthesizers. They also lack essential speech key factors such as intonation, stress, and rhythm. Different works were proposed to solve those issues, including the use of concatenative methods such as unit selection or parametric methods. However, they required a lot of laborious work and domain expertise. Another reason for such poor performance of Arabic speech synthesizers is the lack of speech corpora, unlike English that has many publicly available corpora and audiobooks. This work describes how to generate high quality, natural, and human-like Arabic speech using an end-to-end neural deep network architecture. This work uses just ⟨ text, audio ⟩ pairs with a relatively small amount of recorded audio samples with a total of 2.41 hours. It illustrates how to use English character embedding despite using diacritic Arabic characters as input and how to preprocess these audio samples to achieve the best results.

READ FULL TEXT

page 5

page 9

page 10

research
02/28/2023

ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus

At present, Text-to-speech (TTS) systems that are trained with high-qual...
research
06/15/2022

NatiQ: An End-to-end Text-to-Speech System for Arabic

NatiQ is end-to-end text-to-speech system for Arabic. Our speech synthes...
research
04/07/2022

Arabic Text-To-Speech (TTS) Data Preparation

People may be puzzled by the fact that voice over recordings data sets e...
research
04/08/2021

Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features

Neural sequence-to-sequence text-to-speech synthesis (TTS), such as Taco...
research
03/29/2017

Tacotron: Towards End-to-End Speech Synthesis

A text-to-speech synthesis system typically consists of multiple stages,...
research
05/07/2019

Learning meters of Arabic and English poems with Recurrent Neural Networks: a step forward for language understanding and synthesis

Recognizing a piece of writing as a poem or prose is usually easy for th...
research
05/11/2020

End-To-End Speech Synthesis Applied to Brazilian Portuguese

Voice synthesis systems are popular in different applications, such as p...

Please sign up or login with your details

Forgot password? Click here to reset