End-To-End Speech Synthesis Applied to Brazilian Portuguese

05/11/2020
by   Edresson Casanova, et al.
0

Voice synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. Voice provides an natural way for human-computer interaction. However, not all languages are in the same level when accounting resources and systems for voice synthesis. This work consists of the creation of publicly available resources for the Brazilian Portuguese language in the form of a dataset and deep learning models for end-to-end voice synthesis. The dataset has 10.5 hours from a single speaker. We investigated three different architectures to perform end-to-end speech synthesis: Tacotron 1, DCTTS and Mozilla TTS. We also analysed the performance of models according to different vocoders (RTISI-LA, WaveRNN and Universal WaveRNN), phonetic transcriptions usage, transfer learning (from English) and denoising. In the proposed scenario, a model based on Mozilla TTS and RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. We also verified that transfer learning, phonetic transcriptions and denoising are useful to train the models over the presented dataset. The obtained results are comparable to related works covering English, even using a smaller dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning

Deep learning models are becoming predominant in many fields of machine ...
research
08/17/2019

JVS corpus: free Japanese multi-speaker voice corpus

Thanks to improvements in machine learning techniques, including deep le...
research
08/07/2023

Knowledge Distilled Ensemble Model for sEMG-based Silent Speech Interface

Voice disorders affect millions of people worldwide. Surface electromyog...
research
08/06/2021

An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures

With the rapid development of neural network architectures and speech pr...
research
07/22/2020

A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture

Speech synthesis is the artificial production of human speech. A typical...
research
09/14/2023

SingFake: Singing Voice Deepfake Detection

The rise of singing voice synthesis presents critical challenges to arti...
research
08/20/2020

asya: Mindful verbal communication using deep learning

asya is a mobile application that consists of deep learning models which...

Please sign up or login with your details

Forgot password? Click here to reset