Towards Explainable Convolutional Features for Music Audio Modeling

05/31/2021 ∙ by Anna K. Yanchenko, et al. ∙ 32

Audio signals are often represented as spectrograms and treated as 2D images. In this light, deep convolutional architectures are widely used for music audio tasks even though these two data types have very different structures. In this work, we attempt to "open the black-box" on deep convolutional models to inform future architectures for music audio tasks, and explain the excellent performance of deep convolutions that model spectrograms as 2D images. To this end, we expand recent explainability discussions in deep learning for natural image data to music audio data through systematic experiments using the deep features learned by various convolutional architectures. We demonstrate that deep convolutional features perform well across various target tasks, whether or not they are extracted from deep architectures originally trained on that task. Additionally, deep features exhibit high similarity to hand-crafted wavelet features, whether the deep features are extracted from a trained or untrained model.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 14

page 29

page 30

page 31

page 34

page 35

page 36

page 42

Code Repositories

convolutions-for-music-audio

Repo for explaining convolutions for music audio modeling


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.