MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

08/03/2023
by   Ke Chen, et al.
0

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.

READ FULL TEXT

page 14

page 15

research
09/20/2023

Investigating Personalization Methods in Text to Music Generation

In this work, we investigate the personalization of text-to-music diffus...
research
07/24/2023

IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models

Recent text-to-audio generation techniques have the potential to allow n...
research
08/09/2022

Pure Data and INScore: Animated notation for new music

New music is made with computers, taking advantage of its graphics displ...
research
05/16/2023

Discrete Diffusion Probabilistic Models for Symbolic Music Generation

Denoising Diffusion Probabilistic Models (DDPMs) have made great strides...
research
08/09/2023

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Music generation has attracted growing interest with the advancement of ...
research
04/01/2022

Quantized GAN for Complex Music Generation from Dance Videos

We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal fr...
research
10/27/2021

Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data

Area under the ROC curve (AUC) optimisation techniques developed for neu...

Please sign up or login with your details

Forgot password? Click here to reset