GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models

10/11/2022
by   Matthew Baas, et al.
0

We propose AudioStyleGAN (ASGAN), a new generative adversarial network (GAN) for unconditional speech synthesis. As in the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modification to adaptive discriminator augmentation to probabilistically skip discriminator updates. ASGAN achieves state-of-the-art results in unconditional speech synthesis on the Google Speech Commands dataset. It is also substantially faster than the top-performing diffusion models. Through a design that encourages disentanglement, ASGAN is able to perform voice conversion and speech editing without being explicitly trained to do so. ASGAN demonstrates that GANs are still highly competitive with diffusion models. Code, models, samples: https://github.com/RF5/simple-asgan/.

READ FULL TEXT
research
07/04/2023

Disentanglement in a GAN for Unconditional Speech Synthesis

Can we develop a model that can synthesize realistic speech directly fro...
research
11/08/2022

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

Previous generative adversarial network (GAN)-based neural vocoders are ...
research
10/14/2022

TransFusion: Transcribing Speech with Multinomial Diffusion

Diffusion models have shown exceptional scaling properties in the image ...
research
08/03/2023

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

The diffusion model is capable of generating high-quality data through a...
research
06/21/2023

Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase

Despite the rapid advance of 3D-aware image synthesis, existing studies ...
research
11/21/2022

Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection

Out-of-distribution (OOD) detection is an important task to ensure the r...
research
07/22/2023

Synthesis of Batik Motifs using a Diffusion – Generative Adversarial Network

Batik, a unique blend of art and craftsmanship, is a distinct artistic a...

Please sign up or login with your details

Forgot password? Click here to reset