ArchiSound: Audio Generation with Diffusion

01/30/2023
by   Flavio Schneider, et al.
0

The recent surge in popularity of diffusion models for image generation has brought new attention to the potential of these models in other areas of media generation. One area that has yet to be fully explored is the application of diffusion models to audio generation. Audio generation requires an understanding of multiple aspects, such as the temporal dimension, long term structure, multiple layers of overlapping sounds, and the nuances that only trained listeners can detect. In this work, we investigate the potential of diffusion models for audio generation. We propose a set of models to tackle multiple aspects, including a new method for text-conditional latent audio diffusion with stacked 1D U-Nets, that can generate multiple minutes of music from a textual description. For each model, we make an effort to maintain reasonable inference speed, targeting real-time on a single consumer GPU. In addition to trained models, we provide a collection of open source libraries with the hope of simplifying future work in the field. Samples can be found at https://bit.ly/audio-diffusion. Codes are at https://github.com/archinetai/audio-diffusion-pytorch.

READ FULL TEXT

page 14

page 15

research
01/27/2023

Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion

The recent surge in popularity of diffusion models for image generation ...
research
05/22/2023

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

In recent years, image generation has shown a great leap in performance,...
research
04/27/2021

One Billion Audio Sounds from GPU-enabled Modular Synthesis

We release synth1B1, a multi-modal audio corpus consisting of 1 billion ...
research
09/08/2023

The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

In recent years, video generation has become a prominent generative tool...
research
05/22/2023

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

The rapidly evolving fields of e-commerce and metaverse continue to seek...
research
05/26/2023

An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization

Recently, diffusion models have achieved remarkable success in generatin...
research
07/20/2023

Progressive distillation diffusion for raw music generation

This paper aims to apply a new deep learning approach to the task of gen...

Please sign up or login with your details

Forgot password? Click here to reset