InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

08/28/2023
by   Bing Han, et al.
0

Music editing primarily entails the modification of instrument tracks or remixing in the whole, which offers a novel reinterpretation of the original piece through a series of operations. These music processing methods hold immense potential across various applications but demand substantial expertise. Prior methodologies, although effective for image and audio modifications, falter when directly applied to music. This is attributed to music's distinctive data nature, where such methods can inadvertently compromise the intrinsic harmony and coherence of music. In this paper, we develop InstructME, an Instruction guided Music Editing and remixing framework based on latent diffusion models. Our framework fortifies the U-Net with multi-scale aggregation in order to maintain consistency before and after editing. In addition, we introduce chord progression matrix as condition information and incorporate it in the semantic space to improve melodic harmony while editing. For accommodating extended musical pieces, InstructME employs a chunk transformer, enabling it to discern long-term temporal dependencies within music sequences. We tested InstructME in instrument-editing, remixing, and multi-round editing. Both subjective and objective evaluations indicate that our proposed method significantly surpasses preceding systems in music quality, text relevance and harmony. Demo samples are available at https://musicedit.github.io/

READ FULL TEXT
research
11/01/2022

SDMuse: Stochastic Differential Music Editing and Generation via Hybrid Representation

While deep generative models have empowered music generation, it remains...
research
01/27/2023

Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion

The recent surge in popularity of diffusion models for image generation ...
research
11/08/2018

Learning Disentangled Representations for Timber and Pitch in Music Audio

Timbre and pitch are the two main perceptual properties of musical sound...
research
08/17/2023

Watch Your Steps: Local Image and Scene Editing by Text Instructions

Denoising diffusion models have enabled high-quality image generation an...
research
05/26/2023

A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation

Recently, multi-instrument music generation has become a hot topic. Diff...
research
09/05/2022

Instrument Separation of Symbolic Music by Explicitly Guided Diffusion Model

Similar to colorization in computer vision, instrument separation is to ...
research
11/08/2022

Unsupervised vocal dereverberation with diffusion-based generative models

Removing reverb from reverberant music is a necessary technique to clean...

Please sign up or login with your details

Forgot password? Click here to reset