Measuring and Narrowing the Compositionality Gap in Language Models

10/07/2022
by   Ofir Press, et al.
0

We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This surprising result suggests that while more powerful models memorize and recall more factual knowledge, they show no corresponding improvement in their ability to perform this kind of compositional reasoning. We then demonstrate how elicitive prompting (such as chain of thought) narrows the compositionality gap by reasoning explicitly instead of implicitly. We present a new method, self-ask, that further improves on chain of thought. In our method, the model explicitly asks itself (and then answers) follow-up questions before answering the initial question. We finally show that self-ask's structured prompting lets us easily plug in a search engine to answer the follow-up questions, which additionally improves accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2020

Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?

Multi-hop question answering (QA) requires a model to retrieve and integ...
research
09/20/2022

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

When answering a question, humans utilize the information available acro...
research
04/27/2023

Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs Answering

We investigate how to enhance answer precision in frequently asked quest...
research
12/29/2022

Learning One Abstract Bit at a Time Through Self-Invented Experiments Encoded as Neural Networks

There are two important things in science: (A) Finding answers to given ...
research
04/14/2022

Measuring Compositional Consistency for Video Question Answering

Recent video question answering benchmarks indicate that state-of-the-ar...
research
08/09/2023

Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval

When provided with sufficient explanatory context, smaller Language Mode...
research
09/16/2023

Multimodal Multi-Hop Question Answering Through a Conversation Between Tools and Efficiently Finetuned Large Language Models

We employ a tool-interacting divide-and-conquer strategy enabling large ...

Please sign up or login with your details

Forgot password? Click here to reset