Leveraging Large Language Models for Multiple Choice Question Answering

10/22/2022
by   Joshua Robinson, et al.
0

While large language models (LLMs) like GPT-3 have achieved impressive results on multiple choice question answering (MCQA) tasks in the zero, one, and few-shot settings, they generally lag behind the MCQA state of the art (SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks. An LLM is conditioned on a question (without the associated answer options) and its chosen option is the one assigned the highest probability after normalization (for length, etc.). A more natural prompting approach is to present the question and answer options to the LLM jointly and have it output the symbol (e.g., "A") associated with its chosen answer option. This approach allows the model to explicitly compare answer options, reduces computational costs, and mitigates the effects of tokenization scheme and answer option representations on answer selection. For the natural approach to be effective the LLM it is used with must be able to associate answer options with the symbols that represent them. The LLM needs what we term multiple choice symbol binding (MCSB) ability. This ability varies greatly by model. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse datasets and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated.

READ FULL TEXT
research
04/16/2021

Surface Form Competition: Why the Highest Probability Answer Isn't Always Right

Large language models have shown promising results in zero-shot settings...
research
12/01/2022

Learning to Select from Multiple Options

Many NLP tasks can be regarded as a selection problem from a set of opti...
research
08/22/2023

Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions

Large Language Models (LLMs) have demonstrated remarkable capabilities i...
research
09/07/2023

On Large Language Models' Selection Bias in Multi-Choice Questions

Multi-choice questions (MCQs) serve as a common yet important task forma...
research
05/24/2023

Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs

Have Large Language Models (LLMs) developed a personality? The short ans...
research
05/24/2023

Attentiveness to Answer Choices Doesn't Always Entail High QA Accuracy

When large language models (LMs) are applied in zero- or few-shot settin...
research
07/18/2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Circuit analysis is a promising technique for understanding the internal...

Please sign up or login with your details

Forgot password? Click here to reset