LCEval: Learned Composite Metric for Caption Evaluation

12/24/2020
by   Naeha Sharif, et al.
9

Automatic evaluation metrics hold a fundamental importance in the development and fine-grained analysis of captioning systems. While current evaluation metrics tend to achieve an acceptable correlation with human judgements at the system level, they fail to do so at the caption level. In this work, we propose a neural network-based learned metric to improve the caption-level caption evaluation. To get a deeper insight into the parameters which impact a learned metrics performance, this paper investigates the relationship between different linguistic features and the caption-level correlation of the learned metrics. We also compare metrics trained with different training examples to measure the variations in their evaluation. Moreover, we perform a robustness analysis, which highlights the sensitivity of learned and handcrafted metrics to various sentence perturbations. Our empirical analysis shows that our proposed metric not only outperforms the existing metrics in terms of caption-level correlation but it also shows a strong system-level correlation against human assessments.

READ FULL TEXT

page 2

page 7

page 9

page 15

research
05/29/2018

Human vs Automatic Metrics: on the Importance of Correlation Design

This paper discusses two existing approaches to the correlation analysis...
research
06/02/2021

SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis

The open-ended nature of visual captioning makes it a challenging area f...
research
11/02/2022

Dialect-robust Evaluation of Generated Text

Evaluation metrics that are not robust to dialect variation make it impo...
research
02/18/2020

Learning Similarity Metrics for Numerical Simulations

We propose a neural network-based approach that computes a stable and ge...
research
10/13/2022

An Analysis Method for Metric-Level Switching in Beat Tracking

For expressive music, the tempo may change over time, posing challenges ...
research
05/18/2023

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

With growing capabilities of large language models, prompting them has b...
research
05/12/2021

Discrete representations in neural models of spoken language

The distributed and continuous representations used by neural networks a...

Please sign up or login with your details

Forgot password? Click here to reset