Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT

04/04/2023
by   Ke Chen, et al.
0

In spite of the progress in music source separation research, the small amount of publicly-available clean source data remains a constant limiting factor for performance. Thus, recent advances in self-supervised learning present a largely-unexplored opportunity for improving separation models by leveraging unlabelled music data. In this paper, we propose a self-supervised learning framework for music source separation inspired by the HuBERT speech representation model. We first investigate the potential impact of the original HuBERT model by inserting an adapted version of it into the well-known Demucs V2 time-domain separation model architecture. We then propose a time-frequency-domain self-supervised model, Pac-HuBERT (for primitive auditory clustering HuBERT), that we later use in combination with a Res-U-Net decoder for source separation. Pac-HuBERT uses primitive auditory features of music as unsupervised clustering labels to initialize the self-supervised pretraining process using the Free Music Archive (FMA) dataset. The resulting framework achieves better source-to-distortion ratio (SDR) performance on the MusDB18 test set than the original Demucs V2 and Res-U-Net models. We further demonstrate that it can boost performance with small amounts of supervised data. Ultimately, our proposed framework is an effective solution to the challenge of limited clean source data for music source separation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

A Study of Transfer Learning in Music Source Separation

Supervised deep learning methods for performing audio source separation ...
research
12/06/2020

Source Separation and Depthwise Separable Convolutions for Computer Audition

Given recent advances in deep music source separation, we propose a feat...
research
05/22/2018

Music Source Separation Using Stacked Hourglass Networks

In this paper, we propose a simple yet effective method for multiple mus...
research
10/13/2021

Music Source Separation with Deep Equilibrium Models

While deep neural network-based music source separation (MSS) is very ef...
research
08/30/2022

Towards robust music source separation on loud commercial music

Nowadays, commercial music has extreme loudness and heavily compressed d...
research
04/12/2022

Low Latency Time Domain Multichannel Speech and Music Source Separation

The Goal is to obtain a simple multichannel source separation with very ...
research
07/24/2023

Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data

Music source separation (MSS) faces challenges due to the limited availa...

Please sign up or login with your details

Forgot password? Click here to reset