Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder

08/16/2020
by   Hyun-Wook Yoon, et al.
0

In recent works, a flow-based neural vocoder has shown significant improvement in real-time speech generation task. The sequence of invertible flow operations allows the model to convert samples from simple distribution to audio samples. However, training a continuous density model on discrete audio data can degrade model performance due to the topological difference between latent and actual distribution. To resolve this problem, we propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation. Data dequantization is a well-known method in image generation but has not yet been studied in the audio domain. For this reason, we implement various audio dequantization methods in flow-based neural vocoder and investigate the effect on the generated audio. We conduct various objective performance assessments and subjective evaluation to show that audio dequantization can improve audio generation quality. From our experiments, using audio dequantization produces waveform audio with better harmonic structure and fewer digital artifacts.

READ FULL TEXT

page 2

page 4

research
05/15/2020

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

In this paper, we propose WG-WaveNet, a fast, lightweight, and high-qual...
research
06/25/2021

Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition

Recent studies have shown that neural vocoders based on generative adver...
research
09/27/2021

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

Recently, non-autoregressive neural vocoders have provided remarkable pe...
research
06/01/2020

High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder

Unsupervised disentangled representation learning from the unlabelled au...
research
09/02/2020

WaveGrad: Estimating Gradients for Waveform Generation

This paper introduces WaveGrad, a conditional model for waveform generat...
research
06/04/2019

MelNet: A Generative Model for Audio in the Frequency Domain

Capturing high-level structure in audio waveforms is challenging because...
research
10/24/2022

High Fidelity Neural Audio Compression

We introduce a state-of-the-art real-time, high-fidelity, audio codec le...

Please sign up or login with your details

Forgot password? Click here to reset