Acoustic Scene Classification Using Bilinear Pooling on Time-liked and Frequency-liked Convolution Neural Network

02/14/2020
by   Xing Yong Kek, et al.
0

The current methodology in tackling Acoustic Scene Classification (ASC) task can be described in two steps, preprocessing of the audio waveform into log-mel spectrogram and then using it as the input representation for Convolutional Neural Network (CNN). This paradigm shift occurs after DCASE 2016 where this framework model achieves the state-of-the-art result in ASC tasks on the (ESC-50) dataset and achieved an accuracy of 64.5 improvement over the baseline model, and DCASE 2016 dataset with an accuracy of 90.0 improvements with respect to the baseline system. In this paper, we explored the use of harmonic and percussive source separation (HPSS) to split the audio into harmonic audio and percussive audio, which has received popularity in the field of music information retrieval (MIR). Although works have been done in using HPSS as input representation for CNN model in ASC task, this paper further investigate the possibility on leveraging the separated harmonic component and percussive component by curating 2 CNNs which tries to understand harmonic audio and percussive audio in their natural form, one specialized in extracting deep features in time biased domain and another specialized in extracting deep features in frequency biased domain, respectively. The deep features extracted from these 2 CNNs will then be combined using bilinear pooling. Hence, presenting a two-stream time and frequency CNN architecture approach in classifying acoustic scene. The model is being evaluated on DCASE 2019 sub task 1a dataset and scored an average of 65 Kaggle Leadership Private and Public board.

READ FULL TEXT

page 1

page 3

page 4

research
10/30/2018

SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification

Acoustic Scene Classification (ASC) is one of the core research problems...
research
06/10/2019

DCASE 2019: CNN depth analysis with different channel inputs for Acoustic Scene Classification

The objective of this technical report is to describe the framework used...
research
05/15/2021

1D CNN Architectures for Music Genre Classification

This paper proposes a 1D residual convolutional neural network (CNN) arc...
research
03/18/2016

Comparing Time and Frequency Domain for Audio Event Recognition Using Deep Learning

Recognizing acoustic events is an intricate problem for a machine and an...
research
10/14/2019

Acoustic Scene Classification Based on a Large-margin Factorized CNN

In this paper, we present an acoustic scene classification framework bas...
research
09/15/2023

TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification

Recent studies focus on developing efficient systems for acoustic scene ...
research
05/03/2022

Frequency Domain-Based Detection of Generated Audio

Attackers may manipulate audio with the intent of presenting falsified r...

Please sign up or login with your details

Forgot password? Click here to reset