ArmanTTS single-speaker Persian dataset

04/07/2023
by   Mohammd Hasan Shamgholi, et al.
0

TTS, or text-to-speech, is a complicated process that can be accomplished through appropriate modeling using deep learning methods. In order to implement deep learning models, a suitable dataset is required. Since there is a scarce amount of work done in this field for the Persian language, this paper will introduce the single speaker dataset: ArmanTTS. We compared the characteristics of this dataset with those of various prevalent datasets to prove that ArmanTTS meets the necessary standards for teaching a Persian text-to-speech conversion model. We also combined the Tacotron 2 and HiFi GAN to design a model that can receive phonemes as input, with the output being the corresponding speech. 4.0 value of MOS was obtained from real speech, 3.87 value was obtained by the vocoder prediction and 2.98 value was reached with the synthetic speech generated by the TTS model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning

Deep learning models are becoming predominant in many fields of machine ...
research
03/28/2018

Machine Speech Chain with One-shot Speaker Adaptation

In previous work, we developed a closed-loop speech chain model based on...
research
11/06/2016

Domain Adaptation For Formant Estimation Using Deep Learning

In this paper we present a domain adaptation technique for formant estim...
research
07/31/2023

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

Single-stage text-to-speech models have been actively studied recently, ...
research
09/12/2023

SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

The success of deep learning in speaker recognition relies heavily on th...
research
05/09/2023

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

Voice conversion (VC), as a voice style transfer technology, is becoming...
research
09/18/2023

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

We propose a novel framework for electrolaryngeal speech intelligibility...

Please sign up or login with your details

Forgot password? Click here to reset