Codified audio language modeling learns useful representations for music information retrieval

07/12/2021
by   Rodrigo Castellon, et al.
0

We demonstrate that language models pre-trained on codified (discretely-encoded) music audio learn representations that are useful for downstream MIR tasks. Specifically, we explore representations from Jukebox (Dhariwal et al. 2020): a music generation system containing a language model trained on codified audio from 1M songs. To determine if Jukebox's representations contain useful information for MIR, we use them as input features to train shallow models on several MIR tasks. Relative to representations from conventional MIR models which are pre-trained on tagging, we find that using representations from Jukebox as input features yields 30 stronger performance on average across four MIR tasks: tagging, genre classification, emotion recognition, and key detection. For key detection, we observe that representations from Jukebox are considerably stronger than those from models pre-trained on tagging, suggesting that pre-training via codified audio language modeling may address blind spots in conventional approaches. We interpret the strength of Jukebox's representations as evidence that modeling audio instead of tags provides richer representations for MIR.

READ FULL TEXT
research
09/30/2022

An empirical study of weakly supervised audio tagging embeddings for general audio representations

We study the usability of pre-trained weakly supervised audio tagging (A...
research
09/15/2023

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Large Language Models (LLMs) have shown immense potential in multimodal ...
research
12/08/2021

Learning music audio representations via weak language supervision

Audio representations for music information retrieval are typically lear...
research
06/18/2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

In the era of extensive intersection between art and Artificial Intellig...
research
12/01/2021

Semi-supervised music emotion recognition using noisy student training and harmonic pitch class profiles

We present Mirable's submission to the 2021 Emotions and Themes in Music...
research
03/16/2020

TensorFlow Audio Models in Essentia

Essentia is a reference open-source C++/Python library for audio and mus...
research
07/20/2023

Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

Deep neural network models have become the dominant approach to a large ...

Please sign up or login with your details

Forgot password? Click here to reset