WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis

06/20/2022
by   Yi Wang, et al.
10

Recently, GAN-based neural vocoders such as Parallel WaveGAN, MelGAN, HiFiGAN, and UnivNet have become popular due to their lightweight and parallel structure, resulting in a real-time synthesized waveform with high fidelity, even on a CPU. HiFiGAN and UnivNet are two SOTA vocoders. Despite their high quality, there is still room for improvement. In this paper, motivated by the structure of Vision Outlooker from computer vision, we adopt a similar idea and propose an effective and lightweight neural vocoder called WOLONet. In this network, we develop a novel lightweight block that uses a location-variable, channel-independent, and depthwise dynamic convolutional kernel with sinusoidally activated dynamic kernel weights. To demonstrate the effectiveness and generalizability of our method, we perform an ablation study to verify our novel design and make a subjective and objective comparison with typical GAN-based vocoders. The results show that our WOLONet achieves the best generation quality while requiring fewer parameters than the two neural SOTA vocoders, HiFiGAN and UnivNet.

READ FULL TEXT
research
03/26/2021

Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN

GAN-based neural vocoders, such as Parallel WaveGAN and MelGAN have attr...
research
11/03/2020

StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

In recent years, neural vocoders have surpassed classical speech generat...
research
07/30/2020

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

We present a novel high-fidelity real-time neural vocoder called VocGAN....
research
10/27/2022

Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder

Our previous work, the unified source-filter GAN (uSFGAN) vocoder, intro...
research
06/27/2022

Avocodo: Generative Adversarial Network for Artifact-free Vocoder

Neural vocoders based on the generative adversarial neural network (GAN)...
research
11/28/2017

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

The recently-developed WaveNet architecture is the current state of the ...
research
08/14/2023

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN

The inverse short-time Fourier transform network (iSTFTNet) has garnered...

Please sign up or login with your details

Forgot password? Click here to reset