MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers

08/03/2020
by   Yilun Zhao, et al.
0

Music annotation has always been one of the critical topics in the field of Music Information Retrieval (MIR). Traditional models use supervised learning for music annotation tasks. However, as supervised machine learning approaches increase in complexity, the increasing need for more annotated training data can often not be matched with available data. Moreover, over-reliance on labeled data when training supervised learning models can lead to unexpected results and open vulnerabilities for adversarial attacks. In this paper, a new self-supervised music acoustic representation learning approach named MusiCoder is proposed. Inspired by the success of BERT, MusiCoder builds upon the architecture of self-attention bidirectional transformers. Two pre-training objectives, including Contiguous Frames Masking (CFM) and Contiguous Channels Masking (CCM), are designed to adapt BERT-like masked reconstruction pre-training to continuous acoustic frame domain. The performance of MusiCoder is evaluated in two downstream music annotation tasks. The results show that MusiCoder outperforms the state-of-the-art models in both music genre classification and auto-tagging tasks. The effectiveness of MusiCoder indicates a great potential of a new self-supervised learning approach to understand music: first apply masked reconstruction tasks to pre-train a transformer-based model with massive unlabeled music acoustic data, and then finetune the model on specific downstream tasks with labeled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2022

S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification

In this paper, we propose S3T, a self-supervised pre-training method wit...
research
05/19/2020

Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from Transformers by Self-supervised Learning of Sketch Gestalt

Previous researches of sketches often considered sketches in pixel forma...
research
10/28/2022

Spectrograms Are Sequences of Patches

Self-supervised pre-training models have been used successfully in sever...
research
02/05/2021

Multi-Task Self-Supervised Pre-Training for Music Classification

Deep learning is very data hungry, and supervised learning especially re...
research
04/15/2023

Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging

In the realm of music information retrieval, similarity-based retrieval ...
research
05/31/2023

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Self-supervised learning (SSL) has recently emerged as a promising parad...
research
10/23/2019

Emergent Properties of Finetuned Language Representation Models

Large, self-supervised transformer-based language representation models ...

Please sign up or login with your details

Forgot password? Click here to reset