Multi-Level and Multi-Scale Feature Aggregation Using Pre-trained Convolutional Neural Networks for Music Auto-tagging

03/06/2017
by   Jongpil Lee, et al.
0

Music auto-tagging is often handled in a similar manner to image classification by regarding the 2D audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstractions. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. The architecture is trained in three steps. First, we conduct supervised feature learning to capture local audio features using a set of CNNs with different input sizes. Second, we extract audio features from each layer of the pre-trained convolutional networks separately and aggregate them altogether given a long audio clip. Finally, we put them into fully-connected networks and make final predictions of the tags. Our experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms previous state-of-the-arts on the MagnaTagATune dataset and the Million Song Dataset. We further show that the proposed architecture is useful in transfer learning.

READ FULL TEXT

page 2

page 3

research
06/21/2017

Multi-Level and Multi-Scale Feature Aggregation Using Sample-level Deep Convolutional Neural Networks for Music Classification

Music tag words that describe music audio by text have different levels ...
research
06/16/2019

Multi-scale Embedded CNN for Music Tagging (MsE-CNN)

Convolutional neural networks (CNN) recently gained notable attraction i...
research
04/05/2017

Revisiting the problem of audio-based hit song prediction using convolutional neural networks

Being able to predict whether a song can be a hit has impor- tant applic...
research
10/28/2017

Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

Recent work has shown that the end-to-end approach using convolutional n...
research
08/17/2020

Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features

The analysis of the structure of musical pieces is a task that remains a...
research
08/20/2015

A Deep Bag-of-Features Model for Music Auto-Tagging

Feature learning and deep learning have drawn great attention in recent ...
research
04/06/2021

MuSLCAT: Multi-Scale Multi-Level Convolutional Attention Transformer for Discriminative Music Modeling on Raw Waveforms

In this work, we aim to improve the expressive capacity of waveform-base...

Please sign up or login with your details

Forgot password? Click here to reset