Audio Set classification with attention model: A probabilistic perspective

11/02/2017
by   Qiuqiang Kong, et al.
0

This paper investigate the classification of the Audio Set dataset. Audio Set is a large scale multi instance learning (MIL) dataset of sound clips. In MIL, a bag consists of several instances, and a bag is labelled positive if one or more instances in the audio clip is positive. Audio Set is a MIL dataset because an audio clip is labelled positive for a class if at least one frame contains the corresponding class. We tackle this MIL problem using an attention model and explain this attention model from a novel probabilistic perspective. We define a probability space on each bag. Each instance in a bag has a trainable probability measure for a class. Then the classification of a bag is the expectation of the classification of the instances in the bag with respect to the learned probability measure. Experimental results show that our proposed attention model modeled by fully connected deep neural network obtains mAP of 0.327 on Audio Set dataset, outperforming the Google's baseline of 0.314 and recurrent neural network of 0.325.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2018

A bag-to-class divergence approach to multiple-instance learning

In multi-instance (MI) learning, each object (bag) consists of multiple ...
research
02/11/2020

Learning with Out-of-Distribution Data for Audio Classification

In supervised machine learning, the assumption that training data is lab...
research
03/02/2019

Weakly labelled AudioSet Classification with Attention Neural Networks

Audio tagging is the task of predicting the presence or absence of sound...
research
10/26/2018

Learning and Interpreting Multi-Multi-Instance Learning Networks

We introduce an extension of the multi-instance learning problem where e...
research
11/12/2019

Segment Relevance Estimation for Audio Analysis and Weakly-Labelled Classification

We propose a method that quantifies the importance, namely relevance, of...
research
12/11/2014

The bag-of-frames approach: a not so sufficient model for urban soundscapes

The "bag-of-frames" approach (BOF), which encodes audio signals as the l...
research
05/25/2020

Attention-based Neural Bag-of-Features Learning for Sequence Data

In this paper, we propose 2D-Attention (2DA), a generic attention formul...

Please sign up or login with your details

Forgot password? Click here to reset