Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

02/23/2021
by   Xuenan Xu, et al.
0

Automated Audio Captioning is a cross-modal task, generating natural language descriptions to summarize the audio clips' sound events. However, grounding the actual sound events in the given audio based on its corresponding caption has not been investigated. This paper contributes an AudioGrounding dataset, which provides the correspondence between sound events and the captions provided in Audiocaps, along with the location (timestamps) of each present sound event. Based on such, we propose the text-to-audio grounding (TAG) task, which interactively considers the relationship between audio processing and language understanding. A baseline approach is provided, resulting in an event-F1 score of 28.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2022

Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity

Automatic Audio Captioning (AAC) refers to the task of translating an au...
research
06/02/2023

Enhance Temporal Relations in Audio Captioning with Sound Event Detection

Automated audio captioning aims at generating natural language descripti...
research
06/27/2021

Query-graph with Cross-gating Attention Model for Text-to-Audio Grounding

In this paper, we address the text-to-audio grounding issue, namely, gro...
research
11/12/2022

Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics

The analysis, processing, and extraction of meaningful information from ...
research
03/19/2023

Audio-Text Models Do Not Yet Leverage Natural Language

Multi-modal contrastive learning techniques in the audio-text domain hav...
research
04/27/2021

DASEE A Synthetic Database of Domestic Acoustic Scenes and Events in Dementia Patients Environment

Access to informative databases is a crucial part of notable research de...
research
05/18/2022

Seeing Sounds, Hearing Shapes: a gamified study to evaluate sound-sketches

Sound-shape associations, a subset of cross-modal associations between t...

Please sign up or login with your details

Forgot password? Click here to reset