Convolution channel separation and frequency sub-bands aggregation for music genre classification

11/03/2022
by   Jungwoo Heo, et al.
0

In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that extract short-term features are affected by the layers that extract long-term features because of the back-propagation training. To prevent the distortion of short-term features, we devised the convolution channel separation technique that separates short-term features from long-term feature extraction paths. To extract more diverse features from our framework, we incorporated the frequency sub-bands aggregation method, which divides the input spectrogram along frequency bandwidths and processes each segment. We evaluated our framework using the Melon Playlist dataset which is a large-scale dataset containing 600 times more data than GTZAN which is a widely used dataset in MGC studies. As the result, our framework achieved 70.4 which was improved by 16.9

READ FULL TEXT
research
02/13/2022

Learning long-term music representations via hierarchical contextual constraints

Learning symbolic music representations, especially disentangled represe...
research
08/11/2016

Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version)

The ubiquitous presence of sequence data across fields such as the web, ...
research
01/18/2019

Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification

In this paper, we propose a unified Multi-Object Tracking (MOT) framewor...
research
07/14/2021

Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection

Active speaker detection (ASD) seeks to detect who is speaking in a visu...
research
09/09/2019

Time Series Motion Generation Considering Long Short-Term Motion

Various adaptive abilities are required for robots interacting with huma...
research
06/27/2016

Exploiting the Short-term to Long-term Plasticity Transition in Memristive Nanodevice Learning Architectures

Memristive nanodevices offer new frontiers for computing systems that un...
research
12/11/2014

The bag-of-frames approach: a not so sufficient model for urban soundscapes

The "bag-of-frames" approach (BOF), which encodes audio signals as the l...

Please sign up or login with your details

Forgot password? Click here to reset