Mastering the ABCDs of Complex Questions: Answer-Based Claim Decomposition for Fine-grained Self-Evaluation

05/24/2023
by   Nishant Balepur, et al.
0

When answering complex questions, large language models (LLMs) may produce answers that do not satisfy all criteria of the question. While existing self-evaluation techniques aim to detect if such answers are correct, these techniques are unable to determine which criteria of the question are satisfied by the generated answers. To address this issue, we propose answer-based claim decomposition (ABCD), a prompting strategy that decomposes questions into a series of true/false claims that can be used to verify which criteria of the input question an answer satisfies. Using the decomposed ABCD claims, we perform fine-grained self-evaluation. Through preliminary experiments on three datasets, including a newly-collected challenge dataset ObscureQA, we find that GPT-3.5 has some ability to determine to what extent its answer satisfies the criteria of the input question, and can give insights into the errors and knowledge gaps of the model.

READ FULL TEXT
research
06/23/2023

Retrieving Supporting Evidence for LLMs Generated Answers

Current large language models (LLMs) can exhibit near-human levels of pe...
research
03/21/2022

Teaching language models to support answers with verified quotes

Recent large language models often answer factual questions correctly. B...
research
07/11/2022

Language Models (Mostly) Know What They Know

We study whether language models can evaluate the validity of their own ...
research
05/30/2023

Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard

A comparison between three chatbots which are based on large language mo...
research
05/22/2023

LM vs LM: Detecting Factual Errors via Cross Examination

A prominent weakness of modern language models (LMs) is their tendency t...
research
06/09/2022

MIMICS-Duo: Offline Online Evaluation of Search Clarification

Asking clarification questions is an active area of research; however, r...
research
01/02/2022

Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers

Autograding short textual answers has become much more feasible due to t...

Please sign up or login with your details

Forgot password? Click here to reset