The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

07/03/2019
by   Khaled Koutini, et al.
0

Convolutional Neural Networks (CNNs) have had great success in many machine vision as well as machine audition tasks. Many image recognition network architectures have consequently been adapted for audio processing tasks. However, despite some successes, the performance of many of these did not translate from the image to the audio domain. For example, very deep architectures such as ResNet and DenseNet, which significantly outperform VGG in image recognition, do not perform better in audio processing tasks such as Acoustic Scene Classification (ASC). In this paper, we investigate the reasons why such powerful architectures perform worse in ASC compared to simpler models (e.g., VGG). To this end, we analyse the receptive field (RF) of these CNNs and demonstrate the importance of the RF to the generalization capability of the models. Using our receptive field analysis, we adapt both ResNet and DenseNet, achieving state-of-the-art performance and eventually outperforming the VGG-based models. We introduce systematic ways of adapting the RF in CNNs, and present results on three data sets that show how changing the RF over the time and frequency dimensions affects a model's performance. Our experimental results show that very small or very large RFs can cause performance degradation, but deep models can be made to generalize well by carefully choosing an appropriate RF size within a certain range.

READ FULL TEXT

page 1

page 2

research
09/05/2019

Receptive-field-regularized CNN variants for acoustic scene classification

Acoustic scene classification and related tasks have been dominated by C...
research
05/26/2021

Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

In this paper, we study the performance of variants of well-known Convol...
research
10/28/2019

Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs

We present CP-JKU submission to MediaEval 2019; a Receptive Field-(RF)-r...
research
11/05/2020

Low-Complexity Models for Acoustic Scene Classification Based on Receptive Field Regularization and Frequency Damping

Deep Neural Networks are known to be very demanding in terms of computin...
research
07/27/2020

Receptive-Field Regularized CNNs for Music Classification and Tagging

Convolutional Neural Networks (CNNs) have been successfully used in vari...
research
04/13/2022

Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation

Speech dereverberation is often an important requirement in robust speec...
research
09/15/2023

TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification

Recent studies focus on developing efficient systems for acoustic scene ...

Please sign up or login with your details

Forgot password? Click here to reset