ARCA23K: An audio dataset for investigating open-set label noise

09/19/2021
by   Turab Iqbal, et al.
0

The availability of audio data on sound sharing platforms such as Freesound gives users access to large amounts of annotated audio. Utilising such data for training is becoming increasingly popular, but the problem of label noise that is often prevalent in such datasets requires further investigation. This paper introduces ARCA23K, an Automatically Retrieved and Curated Audio dataset comprised of over 23000 labelled Freesound clips. Unlike past datasets such as FSDKaggle2018 and FSDnoisy18K, ARCA23K facilitates the study of label noise in a more controlled manner. We describe the entire process of creating the dataset such that it is fully reproducible, meaning researchers can extend our work with little effort. We show that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and we refer to this type of label noise as open-set label noise. Experiments are carried out in which we study the impact of label noise in terms of classification performance and representation learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2019

Learning Sound Event Classifiers from Web Audio with Noisy Labels

As sound event classification moves towards larger datasets, issues of l...
research
02/15/2023

Unsupervised classification to improve the quality of a bird song recording dataset

Open audio databases such as Xeno-Canto are widely used to build dataset...
research
05/17/2019

Training Object Detectors With Noisy Data

The availability of a large quantity of labelled training data is crucia...
research
06/07/2019

Audio tagging with noisy labels and minimal supervision

This paper introduces Task 2 of the DCASE2019 Challenge, titled "Audio t...
research
02/11/2020

Learning with Out-of-Distribution Data for Audio Classification

In supervised machine learning, the assumption that training data is lab...
research
05/02/2020

Addressing Missing Labels in Large-scale Sound Event Recognition using a Teacher-student Framework with Loss Masking

The study of label noise in sound event recognition has recently gained ...
research
05/28/2018

Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling

We measure the effect of small amounts of systematic and random label no...

Please sign up or login with your details

Forgot password? Click here to reset