Modeling the Compatibility of Stem Tracks to Generate Music Mashups

by   Jiawen Huang, et al.

A music mashup combines audio elements from two or more songs to create a new work. To reduce the time and effort required to make them, researchers have developed algorithms that predict the compatibility of audio elements. Prior work has focused on mixing unaltered excerpts, but advances in source separation enable the creation of mashups from isolated stems (e.g., vocals, drums, bass, etc.). In this work, we take advantage of separated stems not just for creating mashups, but for training a model that predicts the mutual compatibility of groups of excerpts, using self-supervised and semi-supervised methods. Specifically, we first produce a random mashup creation pipeline that combines stem tracks obtained via source separation, with key and tempo automatically adjusted to match, since these are prerequisites for high-quality mashups. To train a model to predict compatibility, we use stem tracks obtained from the same song as positive examples, and random combinations of stems with key and/or tempo unadjusted as negative examples. To improve the model and use more data, we also train on "average" examples: random combinations with matching key and tempo, where we treat them as unlabeled data as their true compatibility is unknown. To determine whether the combined signal or the set of stem signals is more indicative of the quality of the result, we experiment on two model architectures and train them using semi-supervised learning technique. Finally, we conduct objective and subjective evaluations of the system, comparing them to a standard rule-based system.



There are no comments yet.


page 3

page 4

page 6


Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction

The state of the art in music source separation employs neural networks ...

Neural Loop Combiner: Neural Network Models for Assessing the Compatibility of Loops

Music producers who use loops may have access to thousands in loop libra...

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

We study the problem of source separation for music using deep learning ...

Unsupervised Source Separation By Steering Pretrained Music Models

We showcase an unsupervised method that repurposes deep models trained f...

Mixing-Specific Data Augmentation Techniques for Improved Blind Violin/Piano Source Separation

Blind music source separation has been a popular and active subject of r...

Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning

Annotating musical beats is a very long in tedious process. In order to ...

Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice

Previous approaches in singer identification have used one of monophonic...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.