FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs

05/18/2023
by   Won Jang, et al.
0

This paper presents FastFit, a novel neural vocoder architecture that replaces the U-Net encoder with multiple short-time Fourier transforms (STFTs) to achieve faster generation rates without sacrificing sample quality. We replaced each encoder block with an STFT, with parameters equal to the temporal resolution of each decoder block, leading to the skip connection. FastFit reduces the number of parameters and the generation time of the model by almost half while maintaining high fidelity. Through objective and subjective evaluations, we demonstrated that the proposed model achieves nearly twice the generation speed of baseline iteration-based vocoders while maintaining high sound quality. We further showed that FastFit produces sound qualities similar to those of other baselines in text-to-speech evaluation scenarios, including multi-speaker and zero-shot text-to-speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2021

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speak...
research
06/29/2021

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis

Recent advances in neural multi-speaker text-to-speech (TTS) models have...
research
11/19/2020

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

We propose Universal MelGAN, a vocoder that synthesizes high-fidelity sp...
research
10/24/2022

High Fidelity Neural Audio Compression

We introduce a state-of-the-art real-time, high-fidelity, audio codec le...
research
07/13/2022

A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System

Neural-based text-to-speech (TTS) systems achieve very high-fidelity spe...
research
07/05/2022

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Speech coding facilitates the transmission of speech over low-bandwidth ...

Please sign up or login with your details

Forgot password? Click here to reset