General audio tagging with ensembling convolutional neural network and statistical features

10/30/2018
by   Kele Xu, et al.
0

Audio tagging aims to infer descriptive labels from audio clips. Audio tagging is challenging due to the limited size of data and noisy labels. In this paper, we describe our solution for the DCASE 2018 Task 2 general audio tagging challenge. The contributions of our solution include: We investigated a variety of convolutional neural network architectures to solve the audio tagging task. Statistical features are applied to capture statistical patterns of audio features to improve the classification performance. Ensemble learning is applied to ensemble the outputs from the deep classifiers to utilize complementary information. a sample re-weight strategy is employed for ensemble training to address the noisy label problem. Our system achieves a mean average precision (mAP@3) of 0.958, outperforming the baseline system of 0.704. Our system ranked the 1st and 4th out of 558 submissions in the public and private leaderboard of DCASE 2018 Task 2 challenge. Our codes are available at https://github.com/Cocoxili/DCASE2018Task2/.

READ FULL TEXT
research
07/26/2018

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

This paper describes Task 2 of the DCASE 2018 Challenge, titled "General...
research
08/12/2018

Sample Mixed-Based Data Augmentation for Domestic Audio Tagging

Audio tagging has attracted increasing attention since last decade and h...
research
11/26/2018

Combining High-Level Features of Raw Audio Waves and Mel-Spectrograms for Audio Tagging

In this paper, we describe our contribution to Task 2 of the DCASE 2018 ...
research
07/16/2020

Audio Tagging by Cross Filtering Noisy Labels

High quality labeled datasets have allowed deep learning to achieve impr...
research
06/07/2019

Audio tagging with noisy labels and minimal supervision

This paper introduces Task 2 of the DCASE2019 Challenge, titled "Audio t...
research
06/12/2013

Horizontal and Vertical Ensemble with Deep Representation for Classification

Representation learning, especially which by using deep learning, has be...
research
12/27/2017

A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging

The lack of strong labels has severely limited the state-of-the-art full...

Please sign up or login with your details

Forgot password? Click here to reset