A Global-local Attention Framework for Weakly Labelled Audio Tagging

02/03/2021
by   Helin Wang, et al.
0

Weakly labelled audio tagging aims to predict the classes of sound events within an audio clip, where the onset and offset times of the sound events are not provided. Previous works have used the multiple instance learning (MIL) framework, and exploited the information of the whole audio clip by MIL pooling functions. However, the detailed information of sound events such as their durations may not be considered under this framework. To address this issue, we propose a novel two-stream framework for audio tagging by exploiting the global and local information of sound events. The global stream aims to analyze the whole audio clip in order to capture the local clips that need to be attended using a class-wise selection module. These clips are then fed to the local stream to exploit the detailed information for a better decision. Experimental results on the AudioSet show that our proposed method can significantly improve the performance of audio tagging under different baseline network architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2018

Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data

Audio tagging aims to predict one or several labels in an audio clip. Ma...
research
03/02/2019

Weakly Labelled AudioSet Tagging with Attention Neural Networks

Audio tagging is the task of predicting the presence or absence of sound...
research
10/22/2022

GCT: Gated Contextual Transformer for Sequential Audio Tagging

Audio tagging aims to assign predefined tags to audio clips to indicate ...
research
09/23/2022

UniKW-AT: Unified Keyword Spotting and Audio Tagging

Within the audio research community and the industry, keyword spotting (...
research
10/03/2021

Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging

Audio tagging aims at predicting sound events occurred in a recording. T...
research
10/22/2018

A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling

Sound event detection (SED) entails two subtasks: recognizing what types...
research
11/17/2018

Polyphonic audio tagging with sequentially labelled data using CRNN with learnable gated linear units

Audio tagging aims to detect the types of sound events occurring in an a...

Please sign up or login with your details

Forgot password? Click here to reset