Towards Neural Language Evaluators

09/20/2019
by   Hassan Kané, et al.
0

We review three limitations of BLEU and ROUGE -- the most popular metrics used to assess reference summaries against hypothesis summaries, come up with criteria for what a good metric should behave like and propose concrete ways to use recent Transformers-based Language Models to assess reference summaries against hypothesis summaries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2023

Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries Through Blinded Reviewers and Text Classification Algorithms

Large Language Models (LLMs) have gathered significant attention due to ...
research
05/23/2023

Evaluating Factual Consistency of Summaries with Large Language Models

Detecting factual errors in summaries has been an important and challeng...
research
11/15/2022

Evaluating the Factual Consistency of Large Language Models Through Summarization

While large language models (LLMs) have proven to be effective on a larg...
research
01/06/2017

Enumeration of Extractive Oracle Summaries

To analyze the limitations and the future directions of the extractive s...
research
08/04/2023

Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization

While very popular for evaluating extractive summarization task, the ROU...
research
05/18/2023

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Factual consistency evaluation is often conducted using Natural Language...
research
12/02/2019

Spatially and Temporally Coherent Visual Summaries

When exploring large time-varying data sets, visual summaries are a usef...

Please sign up or login with your details

Forgot password? Click here to reset