A dataset for Audio-Visual Sound Event Detection in Movies

02/14/2023
by   Rajat Hebbar, et al.
0

Audio event detection is a widely studied audio processing task, with applications ranging from self-driving cars to healthcare. In-the-wild datasets such as Audioset have propelled research in this field. However, many efforts typically involve manual annotation and verification, which is expensive to perform at scale. Movies depict various real-life and fictional scenarios which makes them a rich resource for mining a wide-range of audio events. In this work, we present a dataset of audio events called Subtitle-Aligned Movie Sounds (SAM-S). We use publicly-available closed-caption transcripts to automatically mine over 110K audio events from 430 movies. We identify three dimensions to categorize audio events: sound, source, quality, and present the steps involved to produce a final taxonomy of 245 sounds. We discuss the choices involved in generating the taxonomy, and also highlight the human-centered nature of sounds in our dataset. We establish a baseline performance for audio-only sound classification of 34.76 visual information can further improve the performance by about 5 code are made available for research at https://github.com/usc-sail/mica-subtitle-aligned-movie-sounds

READ FULL TEXT

page 1

page 3

research
08/30/2023

AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition

Environmental sound scene and sound event recognition is important for t...
research
08/25/2021

A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games

The automatic detection of events in complex sports games like soccer an...
research
04/10/2018

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

The thud of a bouncing ball, the onset of speech as lips open -- when vi...
research
05/18/2020

Cross-Task Transfer for Multimodal Aerial Scene Recognition

Aerial scene recognition is a fundamental task in remote sensing and has...
research
04/14/2021

Audio-based cough counting using independent subspace analysis

In this paper, an algorithm designed to detect characteristic cough even...
research
10/20/2022

MovieCLIP: Visual Scene Recognition in Movies

Longform media such as movies have complex narrative structures, with ev...
research
03/14/2020

Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events

Immersive audio-visual perception relies on the spatial integration of b...

Please sign up or login with your details

Forgot password? Click here to reset