Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

10/28/2017
by   Taejun Kim, et al.
0

Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution layer. In this paper, we improve the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it. We compare different combinations of the modules in building CNN architectures. The results show that they achieve significant improvements over previous state-of-the-art models on the MagnaTagATune dataset and comparable results on Million Song Dataset. Furthermore, we analyze and visualize our model to show how the 1-D CNN operates.

READ FULL TEXT
research
03/06/2017

Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms

Recently, the end-to-end approach that learns hierarchical representatio...
research
03/06/2017

Multi-Level and Multi-Scale Feature Aggregation Using Pre-trained Convolutional Neural Networks for Music Auto-tagging

Music auto-tagging is often handled in a similar manner to image classif...
research
12/04/2017

Raw Waveform-based Audio Classification Using Sample-level CNN Architectures

Music, speech, and acoustic scene sound are often handled separately in ...
research
06/16/2019

Multi-scale Embedded CNN for Music Tagging (MsE-CNN)

Convolutional neural networks (CNN) recently gained notable attraction i...
research
07/08/2020

A study of Neural networks point source extraction on simulated Fermi/LAT Telescope images

Astrophysical images in the GeV band are challenging to analyze due to t...
research
06/01/2020

Evaluation of CNN-based Automatic Music Tagging Models

Recent advances in deep learning accelerated the development of content-...
research
06/16/2022

GoodBye WaveNet – A Language Model for Raw Audio with Context of 1/2 Million Samples

Modeling long-term dependencies for audio signals is a particularly chal...

Please sign up or login with your details

Forgot password? Click here to reset