FSD50K: an Open Dataset of Human-Labeled Sound Events

10/01/2020
by   Eduardo Fonseca, et al.
0

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on a massive amount of audio tracks from YouTube videos and encompassing over 500 classes of everyday sounds. However, AudioSet is not an open dataset—its release consists of pre-computed audio features (instead of waveforms), which limits the adoption of some SER methods. Downloading the original audio tracks is also problematic due to constituent YouTube videos gradually disappearing and usage rights issues, which casts doubts over the suitability of this resource for systems' benchmarking. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51k audio clips totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audio-informed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research.

READ FULL TEXT

page 1

page 6

page 13

page 18

research
03/23/2021

GISE-51: A scalable isolated sound events dataset

Most of the existing isolated sound event datasets comprise a small numb...
research
11/22/2020

QuerYD: A video dataset with high-quality textual and audio narrations

We introduce QuerYD, a new large-scale dataset for retrieval and event l...
research
01/04/2019

Learning Sound Event Classifiers from Web Audio with Noisy Labels

As sound event classification moves towards larger datasets, issues of l...
research
12/08/2021

Audio-Visual Synchronisation in the wild

In this paper, we consider the problem of audio-visual synchronisation a...
research
02/26/2020

An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments

The problem of training a deep neural network with a small set of positi...
research
02/01/2023

Epic-Sounds: A Large-scale Dataset of Actions That Sound

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations cap...
research
08/05/2019

Acoustic Sounds for Wellbeing: A Novel Dataset and Baseline Results

The field of sound healing includes ancient practices coming from a broa...

Please sign up or login with your details

Forgot password? Click here to reset