Spatio-Temporal Attention Pooling for Audio Scene Classification

04/06/2019
by   Oliver Y. Chén, et al.
0

Acoustic scenes are rich and redundant in their content. In this work, we present a spatio-temporal attention pooling layer coupled with a convolutional recurrent neural network to learn from patterns that are discriminative while suppressing those that are irrelevant for acoustic scene classification. The convolutional layers in this network learn invariant features from time-frequency input. The bidirectional recurrent layers are then able to encode the temporal dynamics of the resulting convolutional features. Afterwards, a two-dimensional attention mask is formed via the outer product of the spatial and temporal attention vectors learned from two designated attention layers to weigh and pool the recurrent output into a final feature vector for classification. The network is trained with between-class examples generated from between-class data augmentation. Experiments demonstrate that the proposed method not only outperforms a strong convolutional neural network baseline but also sets new state-of-the-art performance on the LITIS Rouen dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
02/26/2019

Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network

The temporal dynamics and the discriminative information in the audio si...
research
07/11/2016

Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling

We trained a deep all-convolutional neural network with masked global po...
research
05/13/2019

A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

In spite of its importance, passenger demand prediction is a highly chal...
research
01/24/2019

Multi-stream Network With Temporal Attention For Environmental Sound Classification

Environmental sound classification systems often do not perform robustly...
research
07/04/2019

Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

Environmental sound classification (ESC) is a challenging problem due to...
research
01/23/2021

Sequence-based Dynamic Handwriting Analysis for Parkinson's Disease Detection with One-dimensional Convolutions and BiGRUs

Parkinson's disease (PD) is commonly characterized by several motor symp...
research
07/15/2019

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

This technical report describes the IOA team's submission for TASK1A of ...

Please sign up or login with your details

Forgot password? Click here to reset