HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

10/12/2020
by   Jungil Kong, et al.
0

Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality while generating 22.05 kHz high-fidelity audio 167.9 times faster than real-time on a single V100 GPU. We further show the generality of HiFi-GAN to the mel-spectrogram inversion of unseen speakers and end-to-end speech synthesis. Finally, a small footprint version of HiFi-GAN generates samples 13.4 times faster than real-time on CPU with comparable quality to an autoregressive counterpart.

READ FULL TEXT
research
09/25/2019

High Fidelity Speech Synthesis with Adversarial Networks

Generative adversarial networks have seen rapid development in recent ye...
research
09/27/2021

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

Recently, non-autoregressive neural vocoders have provided remarkable pe...
research
10/08/2019

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Previous works <cit.> have found that generating coherent raw audio wave...
research
04/10/2021

MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis

In recent years, the use of Generative Adversarial Networks (GANs) has b...
research
11/28/2017

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

The recently-developed WaveNet architecture is the current state of the ...
research
02/23/2019

GANSynth: Adversarial Neural Audio Synthesis

Efficient audio synthesis is an inherently difficult machine learning ta...
research
05/26/2020

A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

In recent years, statistical parametric speech synthesis (SPSS) systems ...

Please sign up or login with your details

Forgot password? Click here to reset