FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

11/27/2020
by   Fajri Koto, et al.
7

In this paper, we propose FFCI, a framework for automatic summarization evaluation that comprises four elements: Faithfulness, Focus, Coverage, and Inter-sentential coherence. We design FFCI by comprehensively studying traditional evaluation metrics and model-based evaluations, including question answering (QA) approaches, STS, next-sentence prediction (NSP), and scores from 19 pre-trained language models. Our study reveals three key findings: (1) calculating BertSCORE between the summary and article sentences yields a higher correlation score than recently-proposed QA-based evaluation methods for faithfulness evaluation; (2) GPT2Score has the best Pearson's correlation for focus and coverage; and (3) a simple NSP model is effective at evaluating inter-sentential coherence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

Neural abstractive summarization models are prone to generate content in...
research
10/06/2022

Just ClozE! A Fast and Simple Method for Evaluating the Factual Consistency in Abstractive Summarization

The issue of factual consistency in abstractive summarization has attrac...
research
04/21/2022

Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

Question answering-based summarization evaluation metrics must automatic...
research
04/30/2020

CohEval: Benchmarking Coherence Models

Although coherence modeling has come a long way in developing novel mode...
research
05/13/2022

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets

Precisely assessing the progress in natural language generation (NLG) ta...
research
10/01/2020

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

Recently, there has been growing interest in using question-answering (Q...
research
08/27/2018

WiSeBE: Window-based Sentence Boundary Evaluation

Sentence Boundary Detection (SBD) has been a major research topic since ...

Please sign up or login with your details

Forgot password? Click here to reset