Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

05/26/2021
by   Khaled Koutini, et al.
0

In this paper, we study the performance of variants of well-known Convolutional Neural Network (CNN) architectures on different audio tasks. We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization. An insufficient RF limits the CNN's ability to fit the training data. In contrast, CNNs with an excessive RF tend to over-fit the training data and fail to generalize to unseen testing data. As state-of-the-art CNN architectures-in computer vision and other domains-tend to go deeper in terms of number of layers, their RF size increases and therefore they degrade in performance in several audio classification and tagging tasks. We study well-known CNN architectures and how their building blocks affect their receptive field. We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures on different audio classification and tagging tasks and datasets. The experiments show that regularizing the RF of CNNs using our proposed approaches can drastically improve the generalization of models, out-performing complex architectures and pre-trained models on larger datasets. The proposed CNNs achieve state-of-the-art results in multiple tasks, from acoustic scene classification to emotion and theme detection in music to instrument recognition, as demonstrated by top ranks in several pertinent challenges (DCASE, MediaEval).

READ FULL TEXT

page 1

page 3

page 8

research
09/05/2019

Receptive-field-regularized CNN variants for acoustic scene classification

Acoustic scene classification and related tasks have been dominated by C...
research
07/27/2020

Receptive-Field Regularized CNNs for Music Classification and Tagging

Convolutional Neural Networks (CNNs) have been successfully used in vari...
research
09/04/2023

Raw Data Is All You Need: Virtual Axle Detector with Enhanced Receptive Field

Rising maintenance costs of ageing infrastructure necessitate innovative...
research
10/28/2019

Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs

We present CP-JKU submission to MediaEval 2019; a Receptive Field-(RF)-r...
research
07/03/2019

The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

Convolutional Neural Networks (CNNs) have had great success in many mach...
research
10/20/2017

Real-time Convolutional Neural Networks for Emotion and Gender Classification

In this paper we propose an implement a general convolutional neural net...
research
09/08/2018

CNNs for Surveillance Footage Scene Classification

In this project, we adapt high-performing CNN architectures to different...

Please sign up or login with your details

Forgot password? Click here to reset