Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data

08/06/2018
by   Yuanbo Hou, et al.
0

Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequential labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a Convolutional Recurrent Neural Network followed by a Connectionist Temporal Classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an Area Under Curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with Max Pooling and Average Pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2018

Polyphonic audio tagging with sequentially labelled data using CRNN with learnable gated linear units

Audio tagging aims to detect the types of sound events occurring in an a...
research
02/03/2021

A Global-local Attention Framework for Weakly Labelled Audio Tagging

Weakly labelled audio tagging aims to predict the classes of sound event...
research
12/10/2019

Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization

Sound event detection (SED) is a task to detect sound events in an audio...
research
10/03/2021

Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging

Audio tagging aims at predicting sound events occurred in a recording. T...
research
11/22/2022

Ontology-aware Learning and Evaluation for Audio Tagging

This study defines a new evaluation metric for audio tagging tasks to ov...
research
03/02/2019

Weakly Labelled AudioSet Tagging with Attention Neural Networks

Audio tagging is the task of predicting the presence or absence of sound...
research
03/22/2022

CT-SAT: Contextual Transformer for Sequential Audio Tagging

Sequential audio event tagging can provide not only the type information...

Please sign up or login with your details

Forgot password? Click here to reset