FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

09/27/2021
by   Manh Luong, et al.
0

Recently, non-autoregressive neural vocoders have provided remarkable performance in generating high-fidelity speech and have been able to produce synthetic speech in real-time. However, non-autoregressive neural vocoders such as WaveGlow are far behind autoregressive neural vocoders like WaveFlow in terms of modeling audio signals due to their limitation in expressiveness. In addition, though NanoFlow is a state-of-the-art autoregressive neural vocoder that has immensely small parameters, its performance is marginally lower than WaveFlow. Therefore, in this paper, we propose a new type of autoregressive neural vocoder called FlowVocoder, which has a small memory footprint and is able to generate high-fidelity audio in real-time. Our proposed model improves the expressiveness of flow blocks by operating a mixture of Cumulative Distribution Function(CDF) for bipartite transformation. Hence, the proposed model is capable of modeling waveform signals as well as WaveFlow, while its memory footprint is much smaller thanWaveFlow. As shown in experiments, FlowVocoder achieves competitive results with baseline methods in terms of both subjective and objective evaluation, also, it is more suitable for real-time text-to-speech applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Several recent work on speech synthesis have employed generative adversa...
research
05/23/2023

EfficientSpeech: An On-Device Text to Speech Model

State of the art (SOTA) neural text to speech (TTS) models can generate ...
research
06/02/2023

Towards Robust FastSpeech 2 by Modelling Residual Multimodality

State-of-the-art non-autoregressive text-to-speech (TTS) models based on...
research
11/12/2020

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

Prosody modeling is an essential component in modern text-to-speech (TTS...
research
08/16/2020

Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder

In recent works, a flow-based neural vocoder has shown significant impro...
research
12/03/2019

WaveFlow: A Compact Flow-based Model for Raw Audio

In this work, we present WaveFlow, a small-footprint generative flow for...
research
11/06/2018

FloWaveNet : A Generative Flow for Raw Audio

Most of modern text-to-speech architectures use a WaveNet vocoder for sy...

Please sign up or login with your details

Forgot password? Click here to reset