Multi-Dimensional Evaluation of Text Summarization with In-Context Learning

06/01/2023
by   Sameer Jain, et al.
0

Evaluation of natural language generation (NLG) is complex and multi-dimensional. Generated text can be evaluated for fluency, coherence, factuality, or any other dimensions of interest. Most frameworks that perform such multi-dimensional evaluation require training on large manually or synthetically generated datasets. In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning, obviating the need for large training datasets. Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization, establishing state-of-the-art on dimensions such as relevance and factual consistency. We then analyze the effects of factors such as the selection and number of in-context examples on performance. Finally, we study the efficacy of in-context learning based evaluators in evaluating zero-shot summaries written by large language models such as GPT-3.

READ FULL TEXT
research
09/18/2023

Summarization is (Almost) Dead

How well can large language models (LLMs) generate summaries? We develop...
research
07/01/2015

Dimensionality on Summarization

Summarization is one of the key features of human intelligence. It plays...
research
10/15/2021

Boosting coherence of language models

Naturality of long-term information structure – coherence – remains a ch...
research
03/27/2023

Large Language Models are Diverse Role-Players for Summarization Evaluation

Text summarization has a wide range of applications in many scenarios. T...
research
03/27/2023

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

The performance of abstractive text summarization has been greatly boost...
research
09/22/2022

Learning to Write with Coherence From Negative Examples

Coherence is one of the critical factors that determine the quality of w...
research
05/22/2023

Are Large Language Models Good Evaluators for Abstractive Summarization?

Human evaluations are often required for abstractive summary evaluations...

Please sign up or login with your details

Forgot password? Click here to reset