A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

05/26/2020
by   Huy Kinh Phan, et al.
0

In recent years, statistical parametric speech synthesis (SPSS) systems have been widely utilized in many interactive speech-based systems (e.g. Amazon's Alexa, Bose's headphones). To select a suitable SPSS system, both speech quality and performance efficiency (e.g. decoding time) must be taken into account. In the paper, we compared four popular Vietnamese SPSS techniques using: 1) hidden Markov models (HMM), 2) deep neural networks (DNN), 3) generative adversarial networks (GAN), and 4) end-to-end (E2E) architectures, which consists of Tacontron 2 and WaveGlow vocoder in terms of speech quality and performance efficiency. We showed that the E2E systems accomplished the best quality, but required the power of GPU to achieve real-time performance. We also showed that the HMM-based system had inferior speech quality, but it was the most efficient system. Surprisingly, the E2E systems were more efficient than the DNN and GAN in inference on GPU. Surprisingly, the GAN-based system did not outperform the DNN in term of quality.

READ FULL TEXT
research
10/12/2020

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Several recent work on speech synthesis have employed generative adversa...
research
02/18/2015

F0 Modeling In Hmm-Based Speech Synthesis System Using Deep Belief Network

In recent years multilayer perceptrons (MLPs) with many hid- den layers ...
research
07/06/2017

Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework

In this paper, we aim at improving the performance of synthesized speech...
research
09/23/2017

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

A method for statistical parametric speech synthesis incorporating gener...
research
10/31/2022

The Importance of Accurate Alignments in End-to-End Speech Synthesis

Unit selection synthesis systems required accurate segmentation and labe...
research
07/28/2018

Analysing Shortcomings of Statistical Parametric Speech Synthesis

Output from statistical parametric speech synthesis (SPSS) remains notic...
research
02/13/2023

Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages

Hidden-Markov-model (HMM) based text-to-speech (HTS) offers flexibility ...

Please sign up or login with your details

Forgot password? Click here to reset