DeepAI AI Chat
Log In Sign Up

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

by   Manh Luong, et al.

Recently, non-autoregressive neural vocoders have provided remarkable performance in generating high-fidelity speech and have been able to produce synthetic speech in real-time. However, non-autoregressive neural vocoders such as WaveGlow are far behind autoregressive neural vocoders like WaveFlow in terms of modeling audio signals due to their limitation in expressiveness. In addition, though NanoFlow is a state-of-the-art autoregressive neural vocoder that has immensely small parameters, its performance is marginally lower than WaveFlow. Therefore, in this paper, we propose a new type of autoregressive neural vocoder called FlowVocoder, which has a small memory footprint and is able to generate high-fidelity audio in real-time. Our proposed model improves the expressiveness of flow blocks by operating a mixture of Cumulative Distribution Function(CDF) for bipartite transformation. Hence, the proposed model is capable of modeling waveform signals as well as WaveFlow, while its memory footprint is much smaller thanWaveFlow. As shown in experiments, FlowVocoder achieves competitive results with baseline methods in terms of both subjective and objective evaluation, also, it is more suitable for real-time text-to-speech applications.


page 1

page 2

page 3

page 4


HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Several recent work on speech synthesis have employed generative adversa...

EfficientSpeech: An On-Device Text to Speech Model

State of the art (SOTA) neural text to speech (TTS) models can generate ...

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

Prosody modeling is an essential component in modern text-to-speech (TTS...

WaveFlow: A Compact Flow-based Model for Raw Audio

In this work, we present WaveFlow, a small-footprint generative flow for...

FloWaveNet : A Generative Flow for Raw Audio

Most of modern text-to-speech architectures use a WaveNet vocoder for sy...

WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

In recent years, various flow-based generative models have been proposed...

High Fidelity Neural Audio Compression

We introduce a state-of-the-art real-time, high-fidelity, audio codec le...