DeepAI AI Chat
Log In Sign Up

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

09/27/2021
by   Manh Luong, et al.
0

Recently, non-autoregressive neural vocoders have provided remarkable performance in generating high-fidelity speech and have been able to produce synthetic speech in real-time. However, non-autoregressive neural vocoders such as WaveGlow are far behind autoregressive neural vocoders like WaveFlow in terms of modeling audio signals due to their limitation in expressiveness. In addition, though NanoFlow is a state-of-the-art autoregressive neural vocoder that has immensely small parameters, its performance is marginally lower than WaveFlow. Therefore, in this paper, we propose a new type of autoregressive neural vocoder called FlowVocoder, which has a small memory footprint and is able to generate high-fidelity audio in real-time. Our proposed model improves the expressiveness of flow blocks by operating a mixture of Cumulative Distribution Function(CDF) for bipartite transformation. Hence, the proposed model is capable of modeling waveform signals as well as WaveFlow, while its memory footprint is much smaller thanWaveFlow. As shown in experiments, FlowVocoder achieves competitive results with baseline methods in terms of both subjective and objective evaluation, also, it is more suitable for real-time text-to-speech applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/12/2020

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Several recent work on speech synthesis have employed generative adversa...
05/23/2023

EfficientSpeech: An On-Device Text to Speech Model

State of the art (SOTA) neural text to speech (TTS) models can generate ...
11/12/2020

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

Prosody modeling is an essential component in modern text-to-speech (TTS...
12/03/2019

WaveFlow: A Compact Flow-based Model for Raw Audio

In this work, we present WaveFlow, a small-footprint generative flow for...
11/06/2018

FloWaveNet : A Generative Flow for Raw Audio

Most of modern text-to-speech architectures use a WaveNet vocoder for sy...
06/08/2020

WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

In recent years, various flow-based generative models have been proposed...
10/24/2022

High Fidelity Neural Audio Compression

We introduce a state-of-the-art real-time, high-fidelity, audio codec le...