Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

07/21/2021
by   Xubo Liu, et al.
0

Deep generative models have recently achieved impressive performance in speech and music synthesis. However, compared to the generation of those domain-specific sounds, generating general sounds (such as siren, gunshots) has received less attention, despite their wide applications. In previous work, the SampleRNN method was considered for sound generation in the time domain. However, SampleRNN is potentially limited in capturing long-range dependencies within sounds as it only back-propagates through a limited number of samples. In this work, we propose a method for generating sounds via neural discrete time-frequency representation learning, conditioned on sound classes. This offers an advantage in efficiently modelling long-range dependencies and retaining local fine-grained structures within sound clips. We evaluate our approach on the UrbanSound8K dataset, compared to SampleRNN, with the performance metrics measuring the quality and diversity of generated sounds. Experimental results show that our method offers comparable performance in quality and significantly better performance in diversity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2019

MelNet: A Generative Model for Audio in the Frequency Domain

Capturing high-level structure in audio waveforms is challenging because...
research
06/26/2018

The challenge of realistic music generation: modelling raw audio at scale

Realistic music generation is a challenging task. When building generati...
research
01/15/2019

Classical Music Generation in Distinct Dastgahs with AlimNet ACGAN

In this paper AlimNet (With respect to great musician, Alim Qasimov) an ...
research
07/04/2019

Large Scale Adversarial Representation Learning

Adversarially trained generative models (GANs) have recently achieved co...
research
09/02/2020

Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations

In this study, we propose the global context guided channel and time-fre...
research
10/01/2021

Simulated annealing for optimization of graphs and sequences

Optimization of discrete structures aims at generating a new structure w...
research
08/29/2022

Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++

Reliable outlier detection is critical for real-world applications of de...

Please sign up or login with your details

Forgot password? Click here to reset