Acoustic scene analysis with multi-head attention networks

09/16/2019
by   Weimin Wang, et al.
9

Acoustic Scene Classification (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns. For example, a cooking scene may contain several sound sources including silverware clinking, chopping, frying, etc. What complicates ASC more is that classes of different activities could have overlapping sounds patterns (e.g. both cooking and dishwashing could have silverware clinking sound). In this paper, we propose a multi-head attention network to model the complex temporal input structures for ASC. The proposed network takes the audio's time-frequency representation as input, and it leverages standard VGG plus LSTM layers to extract high-level feature representation. Further more, it applies multiple attention heads to summarize various patterns of sound events into fixed dimensional representation, for the purpose of final scene classification. The whole network is trained in an end-to-end fashion with back-propagation. Experimental results confirm that our model discovers meaningful sound patterns through the attention mechanism, without using explicit supervision in the alignment. We evaluated our proposed model using DCASE 2018 Task 5 dataset, and achieved competitive performance on par with previous winner's results.

READ FULL TEXT

page 4

page 6

research
04/10/2019

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events

In this paper, we propose a new strategy for acoustic scene classificati...
research
10/29/2018

Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection

In this paper, we propose a temporal-frequential attention model for sou...
research
03/29/2019

Multi-Scale Time-Frequency Attention for Rare Sound Event Detection

Attention mechanism has been widely applied to various sound-related tas...
research
11/21/2019

An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy

Audio classification can distinguish different kinds of sounds, which is...
research
05/26/2020

Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features

In this paper, we present a deep learning framework applied for Acoustic...
research
12/31/2022

Attentional Graph Convolutional Network for Structure-aware Audio-Visual Scene Classification

Audio-Visual scene understanding is a challenging problem due to the uns...
research
08/17/2020

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

Weakly Labelled learning has garnered lot of attention in recent years d...

Please sign up or login with your details

Forgot password? Click here to reset