Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks

03/12/2021
by   Chitralekha Gupta, et al.
0

Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis quality for pitched musical instruments using a 2-channel spectrogram representation consisting of log magnitude and instantaneous frequency (the "IFSpectrogram"). Many other synthesis systems use representations derived from the magnitude spectra, and then depend on a backend component to invert the output magnitude spectrograms that generally result in audible artefacts associated with the inversion process. However, for signals that have closely-spaced frequency components such as non-pitched and other noisy sounds, training the GAN on the 2-channel IFSpectrogram representation offers no advantage over the magnitude spectra based representations. In this paper, we propose that training GANs on single-channel magnitude spectra, and using the Phase Gradient Heap Integration (PGHI) inversion algorithm is a better comprehensive approach for audio synthesis modeling of diverse signals that include pitched, non-pitched, and dynamically complex sounds. We show that this method produces higher-quality output for wideband and noisy sounds, such as pops and chirps, compared to using the IFSpectrogram. Furthermore, the sound quality for pitched sounds is comparable to using the IFSpectrogram, even while using a simpler representation with half the memory requirements.

READ FULL TEXT
research
06/16/2020

Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

In this paper, we compare different audio signal representations, includ...
research
02/12/2018

Synthesizing Audio with Generative Adversarial Networks

While Generative Adversarial Networks (GANs) have seen wide success at t...
research
08/13/2018

Towards Audio to Scene Image Synthesis using Generative Adversarial Network

Humans can imagine a scene from a sound. We want machines to do so by us...
research
04/16/2019

Expediting TTS Synthesis with Adversarial Vocoding

Recent approaches in text-to-speech (TTS) synthesis employ neural networ...
research
10/14/2021

SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs

Single-image generative adversarial networks learn from the internal dis...
research
06/01/2023

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Recent advancements in neural vocoding are predominantly driven by Gener...
research
05/04/2021

VQCPC-GAN: Variable-length Adversarial Audio Synthesis using Vector-Quantized Contrastive Predictive Coding

Influenced by the field of Computer Vision, Generative Adversarial Netwo...

Please sign up or login with your details

Forgot password? Click here to reset