Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System

04/20/2020
by   Viet Lam Phung, et al.
0

Abstract End-to-end text-to-speech (TTS) systems has proved its great success in the presence of a large amount of high-quality training data recorded in anechoic room with high-quality microphone. Another approach is to use available source of found data like radio broadcast news. We aim to optimize the naturalness of TTS system on the found data using a novel data processing method. The data processing method includes 1) utterance selection and 2) prosodic punctuation insertion to prepare training data which can optimize the naturalness of TTS systems. We showed that using the processing data method, an end-to-end TTS achieved a mean opinion score (MOS) of 4.1 compared to 4.3 of natural speech. We showed that the punctuation insertion contributed the most to the result. To facilitate the research and development of TTS systems, we distributed the processed data of one speaker at https://forms.gle/6Hk5YkqgDxAaC2BU6.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2023

ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus

At present, Text-to-speech (TTS) systems that are trained with high-qual...
research
06/15/2022

NatiQ: An End-to-end Text-to-Speech System for Arabic

NatiQ is end-to-end text-to-speech system for Arabic. Our speech synthes...
research
11/10/2019

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

Nowadays vast amounts of speech data are recorded from low-quality recor...
research
02/16/2022

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

State-of-the-art text-to-speech (TTS) systems require several hours of r...
research
10/08/2020

Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Training an end-to-end (E2E) neural network speech-to-intent (S2I) syste...
research
03/04/2021

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

Recently, it has become easier to obtain speech data from various media ...
research
09/11/2020

RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

Deep learning enables the development of efficient end-to-end speech pro...

Please sign up or login with your details

Forgot password? Click here to reset