Feature Pyramid Attention based Residual Neural Network for Environmental Sound Classification

05/28/2022
by   Liguang Zhou, et al.
0

Environmental sound classification (ESC) is a challenging problem due to the unstructured spatial-temporal relations that exist in the sound signals. Recently, many studies have focused on abstracting features from convolutional neural networks while the learning of semantically relevant frames of sound signals has been overlooked. To this end, we present an end-to-end framework, namely feature pyramid attention network (FPAM), focusing on abstracting the semantically relevant features for ESC. We first extract the feature maps of the preprocessed spectrogram of the sound waveform by a backbone network. Then, to build multi-scale hierarchical features of sound spectrograms, we construct a feature pyramid representation of the sound spectrograms by aggregating the feature maps from multi-scale layers, where the temporal frames and spatial locations of semantically relevant frames are localized by FPAM. Specifically, the multiple features are first processed by a dimension alignment module. Afterward, the pyramid spatial attention module (PSA) is attached to localize the important frequency regions spatially with a spatial attention module (SAM). Last, the processed feature maps are refined by a pyramid channel attention (PCA) to localize the important temporal frames. To justify the effectiveness of the proposed FPAM, visualization of attention maps on the spectrograms has been presented. The visualization results show that FPAM can focus more on the semantic relevant regions while neglecting the noises. The effectiveness of the proposed methods is validated on two widely used ESC datasets: the ESC-50 and ESC-10 datasets. The experimental results show that the FPAM yields comparable performance to state-of-the-art methods. A substantial performance increase has been achieved by FPAM compared with the baseline methods.

READ FULL TEXT

page 3

page 12

research
07/04/2019

Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

Environmental sound classification (ESC) is a challenging problem due to...
research
12/31/2022

Attentional Graph Convolutional Network for Structure-aware Audio-Visual Scene Classification

Audio-Visual scene understanding is a challenging problem due to the uns...
research
08/03/2018

Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification

Local features at neighboring spatial positions in feature maps have hig...
research
10/15/2018

3D Feature Pyramid Attention Module for Robust Visual Speech Recognition

Visual speech recognition is the task to decode the speech content from ...
research
09/17/2023

Efficient Pyramid Channel Attention Network for Pathological Myopia Detection

Pathological myopia (PM) is the leading ocular disease for impaired visi...
research
10/26/2020

P^2 Net: Augmented Parallel-Pyramid Net for Attention Guided Pose Estimation

We propose an augmented Parallel-Pyramid Net (P^2 Net) with feature refi...
research
06/12/2022

Indirect-Instant Attention Optimization for Crowd Counting in Dense Scenes

One of appealing approaches to guiding learnable parameter optimization,...

Please sign up or login with your details

Forgot password? Click here to reset