Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity

10/03/2022
by   Swapnil Bhosale, et al.
0

Automatic Audio Captioning (AAC) refers to the task of translating an audio sample into a natural language (NL) text that describes the audio events, source of the events and their relationships. Unlike NL text generation tasks, which rely on metrics like BLEU, ROUGE, METEOR based on lexical semantics for evaluation, the AAC evaluation metric requires an ability to map NL text (phrases) that correspond to similar sounds in addition lexical semantics. Current metrics used for evaluation of AAC tasks lack an understanding of the perceived properties of sound represented by text. In this paper, wepropose a novel metric based on Text-to-Audio Grounding (TAG), which is, useful for evaluating cross modal tasks like AAC. Experiments on publicly available AAC data-set shows our evaluation metric to perform better compared to existing metrics used in NL text and image captioning literature.

READ FULL TEXT
research
02/23/2021

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

Automated Audio Captioning is a cross-modal task, generating natural lan...
research
06/10/2021

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

Automatic evaluations for natural language generation (NLG) conventional...
research
11/12/2022

Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics

The analysis, processing, and extraction of meaningful information from ...
research
03/06/2023

Models See Hallucinations: Evaluating the Factuality in Video Captioning

Video captioning aims to describe events in a video with natural languag...
research
05/11/2022

A Comprehensive Survey of Automated Audio Captioning

Automated audio captioning, a task that mimics human perception as well ...
research
06/27/2021

Query-graph with Cross-gating Attention Model for Text-to-Audio Grounding

In this paper, we address the text-to-audio grounding issue, namely, gro...
research
09/01/2023

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding

Object proposal generation serves as a standard pre-processing step in V...

Please sign up or login with your details

Forgot password? Click here to reset