ChatGPT Hallucinates when Attributing Answers

09/17/2023
by   Guido Zuccon, et al.
0

Can ChatGPT provide evidence to support its answers? Does the evidence it suggests actually exist and does it really support its answer? We investigate these questions using a collection of domain-specific knowledge-based questions, specifically prompting ChatGPT to provide both an answer and supporting evidence in the form of references to external sources. We also investigate how different prompts impact answers and evidence. We find that ChatGPT provides correct or partially correct answers in about half of the cases (50.6 times. We further provide insights on the generated references that reveal common traits among the references that ChatGPT generates, and show how even if a reference provided by the model does exist, this reference often does not support the claims ChatGPT attributes to it. Our findings are important because (1) they are the first systematic analysis of the references created by ChatGPT in its answers; (2) they suggest that the model may leverage good quality information in producing correct answers, but is unable to attribute real evidence to support its answers. Prompts, raw result files and manual analysis are made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2022

Teaching language models to support answers with verified quotes

Recent large language models often answer factual questions correctly. B...
research
09/18/2023

What does ChatGPT know about natural science and engineering?

ChatGPT is a powerful language model from OpenAI that is arguably able t...
research
03/28/2019

An Empirical Study of Obsolete Answers on Stack Overflow

Stack Overflow accumulates an enormous amount of software engineering kn...
research
10/16/2021

Tackling Multi-Answer Open-Domain Questions via a Recall-then-Verify Framework

Open domain questions are likely to be open-ended and ambiguous, leading...
research
04/11/2022

Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

Current QA systems can generate reasonable-sounding yet false answers wi...
research
09/20/2023

Retrieving Supporting Evidence for Generative Question Answering

Current large language models (LLMs) can exhibit near-human levels of pe...
research
10/25/2022

Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence

Question answering models can use rich knowledge sources – up to one hun...

Please sign up or login with your details

Forgot password? Click here to reset