SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis

06/02/2021
by   Joshua Feinglass, et al.
0

The open-ended nature of visual captioning makes it a challenging area for evaluation. The majority of proposed models rely on specialized training to improve human-correlation, resulting in limited adoption, generalizability, and explainabilty. We introduce "typicality", a new formulation of evaluation rooted in information theory, which is uniquely suited for problems lacking a definite ground truth. Typicality serves as our framework to develop a novel semantic comparison, SPARCS, as well as referenceless fluency evaluation metrics. Over the course of our analysis, two separate dimensions of fluency naturally emerge: style, captured by metric SPURTS, and grammar, captured in the form of grammatical outlier penalties. Through extensive experiments and ablation studies on benchmark datasets, we show how these decomposed dimensions of semantics and fluency provide greater system-level insight into captioner differences. Our proposed metrics along with their combination, SMURF, achieve state-of-the-art correlation with human judgment when compared with other rule-based evaluation metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

LCEval: Learned Composite Metric for Caption Evaluation

Automatic evaluation metrics hold a fundamental importance in the develo...
research
03/23/2021

SAFEval: Summarization Asks for Fact-based Evaluation

Summarization evaluation remains an open research problem: current metri...
research
06/17/2018

Learning to Evaluate Image Captioning

Evaluation metrics for image captioning face two challenges. Firstly, co...
research
07/21/2017

Why We Need New Evaluation Metrics for NLG

The majority of NLG evaluation relies on automatic metrics, such as BLEU...
research
10/06/2021

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

The task of image-text matching aims to map representations from differe...
research
07/29/2016

SPICE: Semantic Propositional Image Caption Evaluation

There is considerable interest in the task of automatically generating i...
research
05/19/2019

An Objective Evaluation Metric for image fusion based on Del Operator

In this paper, a novel objective evaluation metric for image fusion is p...

Please sign up or login with your details

Forgot password? Click here to reset