On the Effectiveness of Speech Self-supervised Learning for Music

07/11/2023
by   Yinghao Ma, et al.
0

Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train 12 SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

Learning Music Representations with wav2vec 2.0

Learning music representations that are general-purpose offers the flexi...
research
05/31/2023

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Self-supervised learning (SSL) has recently emerged as a promising parad...
research
11/23/2021

Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Music classification is a music information retrieval (MIR) task to clas...
research
06/22/2023

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Automatic singing voice understanding tasks, such as singer identificati...
research
04/15/2023

Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging

In the realm of music information retrieval, similarity-based retrieval ...
research
08/31/2023

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the...
research
06/18/2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

In the era of extensive intersection between art and Artificial Intellig...

Please sign up or login with your details

Forgot password? Click here to reset