Supervised and Unsupervised Learning of Audio Representations for Music Understanding

10/07/2022
by   Matthew C. McCallum, et al.
0

In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal characteristics, tempo and sonority. Specifically, we explore how the domain of pre-training datasets (music or generic audio) and the pre-training methodology (supervised or unsupervised) affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies. This can be done in an efficient manner with models containing less than 100 million parameters that require no fine-tuning or reparameterization for downstream tasks, making this approach practical for industry-scale audio catalogs. Within the class of unsupervised learning strategies, we show that the domain of the training dataset can significantly impact the performance of representations learned by the model. We find that restricting the domain of the pre-training dataset to music allows for training with smaller batch sizes while achieving state-of-the-art in unsupervised learning – and in some cases, supervised learning – for music understanding. We also corroborate that, while achieving state-of-the-art performance on many tasks, supervised learning can cause models to specialize to the supervised information provided, somewhat compromising a model's generality.

READ FULL TEXT
research
10/27/2022

Learning Music Representations with wav2vec 2.0

Learning music representations that are general-purpose offers the flexi...
research
12/08/2021

Learning music audio representations via weak language supervision

Audio representations for music information retrieval are typically lear...
research
05/31/2023

Learning Music Sequence Representation from Text Supervision

Music representation learning is notoriously difficult for its complex h...
research
09/16/2023

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

Guitar tablature is a form of music notation widely used among guitarist...
research
06/03/2021

Unsupervised Learning of General-Purpose Embeddings for Code Changes

Applying machine learning to tasks that operate with code changes requir...
research
05/31/2023

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Self-supervised learning (SSL) has recently emerged as a promising parad...
research
07/25/2011

An end-to-end machine learning system for harmonic analysis of music

We present a new system for simultaneous estimation of keys, chords, and...

Please sign up or login with your details

Forgot password? Click here to reset