Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

06/04/2021
by   Ji-Hoon Kim, et al.
0

Although recent works on neural vocoder have improved the quality of synthesized audio, there still exists a gap between generated and ground-truth audio in frequency space. This difference leads to spectral artifacts such as hissing noise or robotic sound, and thus degrades the sample quality. In this paper, we propose Fre-GAN which achieves frequency-consistent audio synthesis with highly improved generation quality. Specifically, we first present resolution-connected generator and resolution-wise discriminators, which help learn various scales of spectral distributions over multiple frequency bands. Additionally, to reproduce high-frequency components accurately, we leverage discrete wavelet transform in the discriminators. From our experiments, Fre-GAN achieves high-fidelity waveform generation with a gap of only 0.03 MOS compared to ground-truth audio while outperforming standard models in quality.

READ FULL TEXT
research
04/26/2023

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

This paper proposes a source-filter-based generative adversarial neural ...
research
11/01/2021

RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses

Most GAN(Generative Adversarial Network)-based approaches towards high-f...
research
11/19/2020

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

We propose Universal MelGAN, a vocoder that synthesizes high-fidelity sp...
research
02/11/2019

Adversarial Generation of Time-Frequency Features with application in audio synthesis

Time-frequency (TF) representations provide powerful and intuitive featu...
research
11/08/2022

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

Previous generative adversarial network (GAN)-based neural vocoders are ...
research
01/12/2021

MP3net: coherent, minute-long music generation from raw audio with a simple convolutional GAN

We present a deep convolutional GAN which leverages techniques from MP3/...
research
04/05/2022

Arbitrary-Scale Image Synthesis

Positional encodings have enabled recent works to train a single adversa...

Please sign up or login with your details

Forgot password? Click here to reset