Enhance Temporal Relations in Audio Captioning with Sound Event Detection

06/02/2023
by   Zeyu Xie, et al.
0

Automated audio captioning aims at generating natural language descriptions for given audio clips, not only detecting and classifying sounds, but also summarizing the relationships between audio events. Recent research advances in audio captioning have introduced additional guidance to improve the accuracy of audio events in generated sentences. However, temporal relations between audio events have received little attention while revealing complex relations is a key component in summarizing audio content. Therefore, this paper aims to better capture temporal relationships in caption generation with sound event detection (SED), a task that locates events' timestamps. We investigate the best approach to integrate temporal information in a captioning model and propose a temporal tag system to transform the timestamps into comprehensible relations. Results evaluated by the proposed temporal metrics suggest that great improvement is achieved in terms of temporal relation generation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2021

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

Automated Audio Captioning is a cross-modal task, generating natural lan...
research
05/11/2022

A Comprehensive Survey of Automated Audio Captioning

Automated audio captioning, a task that mimics human perception as well ...
research
11/18/2022

Impact of visual assistance for automated audio captioning

We study the impact of visual assistance for automated audio captioning....
research
11/19/2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement

Video to sound generation aims to generate realistic and natural sound g...
research
01/28/2022

Automatic Audio Captioning using Attention weighted Event based Embeddings

Automatic Audio Captioning (AAC) refers to the task of translating audio...
research
11/12/2022

Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics

The analysis, processing, and extraction of meaningful information from ...
research
05/12/2022

Automated Audio Captioning: an Overview of Recent Progress and New Challenges

Automated audio captioning is a cross-modal translation task that aims t...

Please sign up or login with your details

Forgot password? Click here to reset