AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

07/13/2016
by   Sebastian Sager, et al.
0

Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to a lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels. A second contribution is to show the degree of correlation between the audio content and the labels through sound recognition experiments, which yielded results of 70 also providing a performance benchmark. The conclusions in this paper encourage further exploration of the nuances in audio and are meant to complement similar research performed on images and text in multimedia analysis.

READ FULL TEXT

page 3

page 5

research
02/01/2023

Epic-Sounds: A Large-scale Dataset of Actions That Sound

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations cap...
research
02/20/2020

Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

Realistic recordings of soundscapes often have multiple sound events co-...
research
06/16/2023

Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

This paper explores grading text-based audio retrieval relevances with c...
research
07/09/2017

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data

The development of audio event recognition models requires labeled train...
research
05/02/2020

Addressing Missing Labels in Large-scale Sound Event Recognition using a Teacher-student Framework with Loss Masking

The study of label noise in sound event recognition has recently gained ...
research
02/07/2022

Learning Sound Localization Better From Semantically Similar Samples

The objective of this work is to localize the sound sources in visual sc...
research
04/13/2021

Visually Informed Binaural Audio Generation without Binaural Audios

Stereophonic audio, especially binaural audio, plays an essential role i...

Please sign up or login with your details

Forgot password? Click here to reset