Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages

02/13/2023
by   Sudhanshu Srivastava, et al.
0

Hidden-Markov-model (HMM) based text-to-speech (HTS) offers flexibility in speaking styles along with fast training and synthesis while being computationally less intense. HTS performs well even in low-resource scenarios. The primary drawback is that the voice quality is poor compared to that of E2E systems. A hybrid approach combining HMM-based feature generation and neural-network-based HiFi-GAN vocoder to improve HTS synthesis quality is proposed. HTS is trained on high-resolution mel-spectrograms instead of conventional mel generalized coefficients (MGC), and the output mel-spectrogram corresponding to the input text is used in a HiFi-GAN vocoder trained on Indic languages, to produce naturalness that is equivalent to that of E2E systems, as evidenced from the DMOS and PC tests.

READ FULL TEXT
research
07/11/2020

Fast Griffin Lim based Waveform Generation Strategy for Text-to-Speech Synthesis

The performance of text-to-speech (TTS) systems heavily depends on spect...
research
11/17/2022

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

While deep learning-based text-to-speech (TTS) models such as VITS have ...
research
07/04/2017

Hidden-Markov-Model Based Speech Enhancement

The goal of this contribution is to use a parametric speech synthesis sy...
research
08/09/2020

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

Speech synthesis (text to speech, TTS) and recognition (automatic speech...
research
05/26/2020

A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

In recent years, statistical parametric speech synthesis (SPSS) systems ...
research
07/15/2018

Syllabification by Phone Categorization

Syllables play an important role in speech synthesis, speech recognition...
research
02/19/2019

Data Efficient Voice Cloning for Neural Singing Synthesis

There are many use cases in singing synthesis where creating voices from...

Please sign up or login with your details

Forgot password? Click here to reset