Epic-Sounds: A Large-scale Dataset of Actions That Sound

02/01/2023
by   Jaesung Huh, et al.
1

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from visual labels, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2k non-categorised segments. We train and evaluate two state-of-the-art audio recognition models on our dataset, highlighting the importance of audio-only labels and the limitations of current models to recognise actions that sound.

READ FULL TEXT

page 1

page 2

page 4

research
04/26/2021

Identifying Actions for Sound Event Classification

In Psychology, actions are paramount for humans to perceive and separate...
research
07/13/2016

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

Recently, sound recognition has been used to identify sounds, such as ca...
research
11/30/2017

Direct Segmented Sonification of Characteristic Features of the Data Domain

Sonification and audification create auditory displays of datasets. Audi...
research
08/18/2023

Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions

We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of...
research
04/29/2020

VGGSound: A Large-scale Audio-Visual Dataset

Our goal is to collect a large-scale audio-visual dataset with low label...
research
10/20/2022

Play It Back: Iterative Attention for Audio Recognition

A key function of auditory cognition is the association of characteristi...
research
10/01/2020

FSD50K: an Open Dataset of Human-Labeled Sound Events

Most existing datasets for sound event recognition (SER) are relatively ...

Please sign up or login with your details

Forgot password? Click here to reset