Avocodo: Generative Adversarial Network for Artifact-free Vocoder

06/27/2022
by   Taejun Bak, et al.
0

Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency band, most of the GAN-based neural vocoders perform multi-scale analysis that evaluates downsampled speech waveforms. This multi-scale analysis helps the generator improve speech intelligibility. However, in preliminary experiments, we observed that the multi-scale analysis which focuses on the low-frequency band causes unintended artifacts, e.g., aliasing and imaging artifacts, and these artifacts degrade the synthesized speech waveform quality. Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based neural vocoders and propose a GAN-based neural vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts. We introduce two kinds of discriminators to evaluate waveforms in various perspectives: a collaborative multi-band discriminator and a sub-band discriminator. We also utilize a pseudo quadrature mirror filter bank to obtain downsampled multi-band waveforms while avoiding aliasing. The experimental results show that Avocodo outperforms conventional GAN-based neural vocoders in both speech and singing voice synthesis tasks and can synthesize artifact-free speech. Especially, Avocodo is even capable to reproduce high-quality waveforms of unseen speakers.

READ FULL TEXT

page 2

page 3

page 15

research
11/02/2022

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Recent development of neural vocoders based on the generative adversaria...
research
03/24/2023

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis

In speech synthesis, a generative adversarial network (GAN), training a ...
research
03/26/2021

Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN

GAN-based neural vocoders, such as Parallel WaveGAN and MelGAN have attr...
research
10/26/2022

Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network

XiaoiceSing is a singing voice synthesis (SVS) system that aims at gener...
research
11/08/2022

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

Previous generative adversarial network (GAN)-based neural vocoders are ...
research
06/20/2022

WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis

Recently, GAN-based neural vocoders such as Parallel WaveGAN, MelGAN, Hi...
research
05/11/2020

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

In this paper, we propose multi-band MelGAN, a much faster waveform gene...

Please sign up or login with your details

Forgot password? Click here to reset