Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion

01/27/2023
by   Flavio Schneider, et al.
0

The recent surge in popularity of diffusion models for image generation has brought new attention to the potential of these models in other areas of media synthesis. One area that has yet to be fully explored is the application of diffusion models to music generation. Music generation requires to handle multiple aspects, including the temporal dimension, long-term structure, multiple layers of overlapping sounds, and nuances that only trained listeners can detect. In our work, we investigate the potential of diffusion models for text-conditional music generation. We develop a cascading latent diffusion approach that can generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. For each model, we make an effort to maintain reasonable inference speed, targeting real-time on a single consumer GPU. In addition to trained models, we provide a collection of open-source libraries with the hope of facilitating future work in the field. We open-source the following: Music samples for this paper: https://bit.ly/anonymous-mousai; all music samples for all models: https://bit.ly/audio-diffusion; and codes: https://github.com/archinetai/audio-diffusion-pytorch

READ FULL TEXT

page 4

page 8

research
01/30/2023

ArchiSound: Audio Generation with Diffusion

The recent surge in popularity of diffusion models for image generation ...
research
01/16/2023

Msanii: High Fidelity Music Synthesis on a Shoestring Budget

In this paper, we present Msanii, a novel diffusion-based model for synt...
research
08/18/2022

Musika! Fast Infinite Waveform Music Generation

Fast and user-controllable music generation could enable novel ways of c...
research
08/28/2023

InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Music editing primarily entails the modification of instrument tracks or...
research
07/20/2023

Progressive distillation diffusion for raw music generation

This paper aims to apply a new deep learning approach to the task of gen...
research
11/03/2021

Automatic Embedding of Stories Into Collections of Independent Media

We look at how machine learning techniques that derive properties of ite...
research
04/27/2021

One Billion Audio Sounds from GPU-enabled Modular Synthesis

We release synth1B1, a multi-modal audio corpus consisting of 1 billion ...

Please sign up or login with your details

Forgot password? Click here to reset