Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries Through Blinded Reviewers and Text Classification Algorithms

03/30/2023
by   Mayank Soni, et al.
0

Large Language Models (LLMs) have gathered significant attention due to their impressive performance on a variety of tasks. ChatGPT, developed by OpenAI, is a recent addition to the family of language models and is being called a disruptive technology by a few, owing to its human-like text-generation capabilities. Although, many anecdotal examples across the internet have evaluated ChatGPT's strength and weakness, only a few systematic research studies exist. To contribute to the body of literature of systematic research on ChatGPT, we evaluate the performance of ChatGPT on Abstractive Summarization by the means of automated metrics and blinded human reviewers. We also build automatic text classifiers to detect ChatGPT generated summaries. We found that while text classification algorithms can distinguish between real and generated summaries, humans are unable to distinguish between real summaries and those produced by ChatGPT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2023

Summarization is (Almost) Dead

How well can large language models (LLMs) generate summaries? We develop...
research
09/20/2019

Towards Neural Language Evaluators

We review three limitations of BLEU and ROUGE -- the most popular metric...
research
05/24/2023

Is Summary Useful or Not? An Extrinsic Human Evaluation of Text Summaries on Downstream Tasks

Research on automated text summarization relies heavily on human and aut...
research
08/19/2022

Beyond Text Generation: Supporting Writers with Continuous Automatic Text Summaries

We propose a text editor to help users plan, structure and reflect on th...
research
05/23/2023

Evaluating Factual Consistency of Summaries with Large Language Models

Detecting factual errors in summaries has been an important and challeng...
research
05/18/2023

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Factual consistency evaluation is often conducted using Natural Language...
research
06/12/2022

Self-critiquing models for assisting human evaluators

We fine-tune large language models to write natural language critiques (...

Please sign up or login with your details

Forgot password? Click here to reset