CT-SAT: Contextual Transformer for Sequential Audio Tagging

03/22/2022
by   Yuanbo Hou, et al.
0

Sequential audio event tagging can provide not only the type information of audio events, but also the order information between events and the number of events that occur in an audio clip. Most previous works on audio event sequence analysis rely on connectionist temporal classification (CTC). However, CTC's conditional independence assumption prevents it from effectively learning correlations between diverse audio events. This paper first attempts to introduce Transformer into sequential audio tagging, since Transformers perform well in sequence-related tasks. To better utilize contextual information of audio event sequences, we draw on the idea of bidirectional recurrent neural networks, and propose a contextual Transformer (cTransformer) with a bidirectional decoder that could exploit the forward and backward information of event sequences. Experiments on the real-life polyphonic audio dataset show that, compared to CTC-based methods, the cTransformer can effectively combine the fine-grained acoustic representations from the encoder and coarse-grained audio event cues to exploit contextual information to successfully recognize and predict audio event sequences.

READ FULL TEXT
research
10/22/2022

GCT: Gated Contextual Transformer for Sequential Audio Tagging

Audio tagging aims to assign predefined tags to audio clips to indicate ...
research
08/06/2018

Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data

Audio tagging aims to predict one or several labels in an audio clip. Ma...
research
05/01/2022

Relation-guided acoustic scene classification aided with event embeddings

In real life, acoustic scenes and audio events are naturally correlated....
research
06/16/2022

Event-related data conditioning for acoustic event classification

Models based on diverse attention mechanisms have recently shined in tas...
research
07/27/2017

Learning Audio Sequence Representations for Acoustic Event Classification

Acoustic Event Classification (AEC) has become a significant task for ma...
research
08/23/2023

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

Sound events in daily life carry rich information about the objective wo...
research
10/02/2020

AVECL-UMONS database for audio-visual event classification and localization

We introduce the AVECL-UMons dataset for audio-visual event classificati...

Please sign up or login with your details

Forgot password? Click here to reset