A Modulation Front-End for Music Audio Tagging

05/25/2021
by   Cyrus Vahidi, et al.
2

Convolutional Neural Networks have been extensively explored in the task of automatic music tagging. The problem can be approached by using either engineered time-frequency features or raw audio as input. Modulation filter bank representations that have been actively researched as a basis for timbre perception have the potential to facilitate the extraction of perceptually salient features. We explore end-to-end learned front-ends for audio representation learning, ModNet and SincModNet, that incorporate a temporal modulation processing block. The structure is effectively analogous to a modulation filter bank, where the FIR filter center frequencies are learned in a data-driven manner. The expectation is that a perceptually motivated filter bank can provide a useful representation for identifying music features. Our experimental results provide a fully visualisable and interpretable front-end temporal modulation decomposition of raw audio. We evaluate the performance of our model against the state-of-the-art of music tagging on the MagnaTagATune dataset. We analyse the impact on performance for particular tags when time-frequency bands are subsampled by the modulation filters at a progressively reduced rate. We demonstrate that modulation filtering provides promising results for music tagging and feature representation, without using extensive musical domain knowledge in the design of this front-end.

READ FULL TEXT

page 1

page 4

page 5

page 6

research
11/28/2022

Learnable Front Ends Based on Temporal Modulation for Music Tagging

While end-to-end systems are becoming popular in auditory signal process...
research
03/06/2017

Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms

Recently, the end-to-end approach that learns hierarchical representatio...
research
11/07/2017

End-to-end learning for music audio tagging at scale

The lack of data tends to limit the outcomes of deep learning research -...
research
06/23/2023

Modulation Graphs in Popular Music

In this paper, graph theory is used to explore the musical notion of ton...
research
01/01/2019

Exploring spectro-temporal features in end-to-end convolutional neural networks

Triangular, overlapping Mel-scaled filters ("f-banks") are the current s...
research
10/09/2018

Functionally Modular and Interpretable Temporal Filtering for Robust Segmentation

The performance of autonomous systems heavily relies on their ability to...
research
06/10/2014

Music and Vocal Separation Using Multi-Band Modulation Based Features

The potential use of non-linear speech features has not been investigate...

Please sign up or login with your details

Forgot password? Click here to reset