Fast DCTTS: Efficient Deep Convolutional Text-to-Speech

04/01/2021
by   Minsu Kang, et al.
0

We propose an end-to-end speech synthesizer, Fast DCTTS, that synthesizes speech in real time on a single CPU thread. The proposed model is composed of a carefully-tuned lightweight network designed by applying multiple network reduction and fidelity improvement techniques. In addition, we propose a novel group highway activation that can compromise between computational efficiency and the regularization effect of the gating mechanism. As well, we introduce a new metric called Elastic mel-cepstral distortion (EMCD) to measure the fidelity of the output mel-spectrogram. In experiments, we analyze the effect of the acceleration techniques on speed and speech quality. Compared with the baseline model, the proposed model exhibits improved MOS from 2.62 to 2.74 with only 1.76 was improved by 7.45 times, which is fast enough to produce mel-spectrogram in real time without GPU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2020

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

We present a novel high-fidelity real-time neural vocoder called VocGAN....
research
05/12/2020

FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

In this paper, we propose the FeatherWave, yet another variant of WaveRN...
research
05/15/2020

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

In this paper, we propose WG-WaveNet, a fast, lightweight, and high-qual...
research
10/28/2022

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

We propose a lightweight end-to-end text-to-speech model using multi-ban...
research
12/12/2018

FPUAS : Fully Parallel UFANS-based End-to-End Acoustic System with 10x Speed Up

A lightweight end-to-end acoustic system is crucial in the deployment of...
research
09/18/2021

Fast query-by-example speech search using separable model

Traditional Query-by-Example (QbE) speech search approaches usually use ...
research
11/28/2017

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

The recently-developed WaveNet architecture is the current state of the ...

Please sign up or login with your details

Forgot password? Click here to reset