CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis

06/14/2021
by   Simon Rouard, et al.
0

In this paper, we propose a novel score-base generative model for unconditional raw audio synthesis. Our proposal builds upon the latest developments on diffusion process modeling with stochastic differential equations, which already demonstrated promising results on image generation. We motivate novel heuristics for the choice of the diffusion processes better suited for audio generation, and consider the use of a conditional U-Net to approximate the score function. While previous approaches on diffusion models on audio were mainly designed as speech vocoders in medium resolution, our method termed CRASH (Controllable Raw Audio Synthesis with High-resolution) allows us to generate short percussive sounds in 44.1kHz in a controllable way. Through extensive experiments, we showcase on a drum sound generation task the numerous sampling schemes offered by our method (unconditional generation, deterministic generation, inpainting, interpolation, variations, class-conditional sampling) and propose the class-mixing sampling, a novel way to generate "hybrid" sounds. Our proposed method closes the gap with GAN-based methods on raw audio, while offering more flexible generation capabilities with lighter and easier-to-train models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

NU-GAN: High resolution neural upsampling with GAN

In this paper, we propose NU-GAN, a new method for resampling audio from...
research
11/14/2021

Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

The high temporal resolution of audio and our perceptual sensitivity to ...
research
12/14/2021

Score-Based Generative Modeling with Critically-Damped Langevin Diffusion

Score-based generative models (SGMs) have demonstrated remarkable synthe...
research
06/08/2022

Accelerating Score-based Generative Models for High-Resolution Image Synthesis

Score-based generative models (SGMs) have recently emerged as a promisin...
research
01/27/2023

Accelerating Guided Diffusion Sampling with Splitting Numerical Methods

Guided diffusion is a technique for conditioning the output of a diffusi...
research
07/05/2022

Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling

Score-based generative models (SGMs) have recently emerged as a promisin...
research
05/17/2021

ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation

In this paper, we propose to unify the two aspects of voice synthesis, n...

Please sign up or login with your details

Forgot password? Click here to reset