ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

03/27/2023
by   Zheheng Luo, et al.
0

The performance of abstractive text summarization has been greatly boosted by pre-trained language models recently. The main concern of existing abstractive summarization methods is the factual inconsistency problem of their generated summary. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference and question answering et al. However, they have limitations of high computational complexity and relying on annotated data. Most recently, large language models such as ChatGPT have shown strong ability in not only natural language understanding but also natural language inference. In this paper, we study the factual inconsistency evaluation ability of ChatGPT under the zero-shot setting by evaluating it on the coarse-grained and fine-grained factuality evaluation tasks including binary natural language inference (NLI), summary ranking, and consistency rating. Experimental results show that ChatGPT outperforms previous SOTA evaluation metrics on 6/9 datasets across three tasks, demonstrating its great potential for assessing factual inconsistency in the zero-shot setting. The results also highlight the importance of prompt design and the need for future efforts to address ChatGPT's limitations on evaluation bias, wrong reasoning, and hallucination.

READ FULL TEXT
research
11/29/2022

Zero-Shot Opinion Summarization with GPT-3

Very large language models such as GPT-3 have shown impressive performan...
research
05/21/2023

OntoType: Ontology-Guided Zero-Shot Fine-Grained Entity Typing with Weak Supervision from Pre-Trained Language Models

Fine-grained entity typing (FET), which assigns entities in text with co...
research
06/01/2023

Multi-Dimensional Evaluation of Text Summarization with In-Context Learning

Evaluation of natural language generation (NLG) is complex and multi-dim...
research
09/14/2021

Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation

Natural language generation (NLG) spans a broad range of tasks, each of ...
research
03/14/2023

Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences

As a natural language assistant, ChatGPT is capable of performing variou...
research
09/17/2022

Selective Token Generation for Few-shot Natural Language Generation

Natural language modeling with limited training data is a challenging pr...
research
09/10/2021

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

As large-scale, pre-trained language models achieve human-level and supe...

Please sign up or login with your details

Forgot password? Click here to reset