FloWaveNet : A Generative Flow for Raw Audio

by   Sungwon Kim, et al.

Most of modern text-to-speech architectures use a WaveNet vocoder for synthesizing a high-fidelity waveform audio, but there has been a limitation for practical applications due to its slow autoregressive sampling scheme. A recently suggested Parallel WaveNet has achieved a real-time audio synthesis by incorporating Inverse Autogressive Flow (IAF) for parallel sampling. However, the Parallel WaveNet requires a two-stage training pipeline with a well-trained teacher network and is prone to mode collapsing if using a probability distillation training only. We propose FloWaveNet, a flow-based generative model for raw audio synthesis. FloWaveNet requires only a single maximum likelihood loss without any additional auxiliary terms and is inherently parallel due to the flow-based transformation. The model can efficiently sample the raw audio in real-time with a clarity comparable to the original WaveNet and ClariNet. Codes and samples for all models including our FloWaveNet is available via GitHub: https://github.com/ksw0306/FloWaveNet


WaveFlow: A Compact Flow-based Model for Raw Audio

In this work, we present WaveFlow, a small-footprint generative flow for...

A Generative Model for Raw Audio Using Transformer Architectures

This paper proposes a novel way of doing audio synthesis at the waveform...

WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

In recent years, various flow-based generative models have been proposed...

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

Recently, non-autoregressive neural vocoders have provided remarkable pe...

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

This paper introduces WaveGrad 2, a non-autoregressive generative model ...

It's Raw! Audio Generation with State-Space Models

Developing architectures suitable for modeling raw audio is a challengin...

GANSynth: Adversarial Neural Audio Synthesis

Efficient audio synthesis is an inherently difficult machine learning ta...

Code Repositories


A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"

view repo

Please sign up or login with your details

Forgot password? Click here to reset