Retrieving Supporting Evidence for LLMs Generated Answers

06/23/2023
by   Siqing Huo, et al.
0

Current large language models (LLMs) can exhibit near-human levels of performance on many natural language tasks, including open-domain question answering. Unfortunately, they also convincingly hallucinate incorrect answers, so that responses to questions must be verified against external sources before they can be accepted at face value. In this paper, we report a simple experiment to automatically verify generated answers against a corpus. After presenting a question to an LLM and receiving a generated answer, we query the corpus with the combination of the question + generated answer. We then present the LLM with the combination of the question + generated answer + retrieved answer, prompting it to indicate if the generated answer can be supported by the retrieved answer. We base our experiment on questions and passages from the MS MARCO (V1) test collection, exploring three retrieval approaches ranging from standard BM25 to a full question answering stack, including a reader based on the LLM. For a large fraction of questions, we find that an LLM is capable of verifying its generated answer if appropriate supporting material is provided. However, with an accuracy of 70-80 relied upon to detect hallucinations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2023

Retrieving Supporting Evidence for Generative Question Answering

Current large language models (LLMs) can exhibit near-human levels of pe...
research
07/12/2017

Quasar: Datasets for Question Answering by Search and Reading

We present two new large-scale datasets aimed at evaluating systems desi...
research
05/24/2023

Mastering the ABCDs of Complex Questions: Answer-Based Claim Decomposition for Fine-grained Self-Evaluation

When answering complex questions, large language models (LLMs) may produ...
research
06/19/2020

A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

COVID-19 has resulted in an ongoing pandemic and as of 12 June 2020, has...
research
09/10/2023

AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions

The development of large high-quality datasets and high-performing model...
research
02/02/2023

Creating a Large Language Model of a Philosopher

Can large language models be trained to produce philosophical texts that...
research
07/06/2023

CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering

In this paper, we present CORE-GPT, a novel question-answering platform ...

Please sign up or login with your details

Forgot password? Click here to reset