Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness

04/23/2023
by   Bo Li, et al.
0

The capability of Large Language Models (LLMs) like ChatGPT to comprehend user intent and provide reasonable responses has made them extremely popular lately. In this paper, we focus on assessing the overall ability of ChatGPT using 7 fine-grained information extraction (IE) tasks. Specially, we present the systematically analysis by measuring ChatGPT's performance, explainability, calibration, and faithfulness, and resulting in 15 keys from either the ChatGPT or domain experts. Our findings reveal that ChatGPT's performance in Standard-IE setting is poor, but it surprisingly exhibits excellent performance in the OpenIE setting, as evidenced by human evaluation. In addition, our research indicates that ChatGPT provides high-quality and trustworthy explanations for its decisions. However, there is an issue of ChatGPT being overconfident in its predictions, which resulting in low calibration. Furthermore, ChatGPT demonstrates a high level of faithfulness to the original text in the majority of cases. We manually annotate and release the test sets of 7 fine-grained IE tasks contains 14 datasets to further promote the research. The datasets and code are available at https://github.com/pkuserc/ChatGPT_for_IE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2023

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Text editing is a crucial task that involves modifying text to better al...
research
02/10/2021

Towards More Fine-grained and Reliable NLP Performance Prediction

Performance prediction, the task of estimating a system's performance wi...
research
11/02/2021

Human Attention in Fine-grained Classification

The way humans attend to, process and classify a given image has the pot...
research
08/07/2023

Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

Large Language Models (LLMs) for code are a family of high-parameter, tr...
research
05/13/2023

On the Hidden Mystery of OCR in Large Multimodal Models

Large models have recently played a dominant role in natural language pr...
research
07/20/2023

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

Evaluation of Large Language Models (LLMs) is challenging because aligni...
research
05/11/2023

SMATCH++: Standardized and Extended Evaluation of Semantic Graphs

The Smatch metric is a popular method for evaluating graph distances, as...

Please sign up or login with your details

Forgot password? Click here to reset