Multi-Task Self-Supervised Pre-Training for Music Classification

02/05/2021
by   Ho-Hsiang Wu, et al.
4

Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and annotations for audio are time consuming and less intuitive. Besides, models learned from labeled dataset often embed biases specific to that particular dataset. Therefore, unsupervised learning techniques become popular approaches in solving machine listening problems. Particularly, a self-supervised learning technique utilizing reconstructions of multiple hand-crafted audio features has shown promising results when it is applied to speech domain such as emotion recognition and automatic speech recognition (ASR). In this paper, we apply self-supervised and multi-task learning methods for pre-training music encoders, and explore various design choices including encoder architectures, weighting mechanisms to combine losses from multiple tasks, and worker selections of pretext tasks. We investigate how these design choices interact with various downstream music classification tasks. We find that using various music specific workers altogether with weighting mechanisms to balance the losses during pre-training helps improve and generalize to the downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Spectrograms Are Sequences of Patches

Self-supervised pre-training models have been used successfully in sever...
research
10/09/2021

Wav2vec-S: Semi-Supervised Pre-Training for Speech Recognition

Self-supervised pre-training has dramatically improved the performance o...
research
08/03/2020

MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers

Music annotation has always been one of the critical topics in the field...
research
10/30/2020

Joint Masked CPC and CTC Training for ASR

Self-supervised learning (SSL) has shown promise in learning representat...
research
10/22/2020

Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern Recognition and Self-Supervised Models

Deep learning based speech denoising still suffers from the challenge of...
research
11/10/2020

UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability Prediction with Multi-task Learning on Self-Supervised Annotations

This work describes a self-supervised data augmentation approach used to...
research
01/25/2020

Multi-task self-supervised learning for Robust Speech Recognition

Despite the growing interest in unsupervised learning, extracting meanin...

Please sign up or login with your details

Forgot password? Click here to reset