Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

by   Pan Lu, et al.

When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (SQA), a new benchmark that consists of  21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering SQA questions. SQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20 fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96 language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40


page 4

page 5

page 16

page 18


Exploiting Reasoning Chains for Multi-hop Science Question Answering

We propose a novel Chain Guided Retriever-reader (CGR) framework to mode...

Rationale-Augmented Ensembles in Language Models

Recent research has shown that rationales, or step-by-step chains of tho...

STaR: Bootstrapping Reasoning With Reasoning

Generating step-by-step "chain-of-thought" rationales improves language ...

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

AI has achieved remarkable mastery over games such as Chess, Go, and Pok...

Can language models learn from explanations in context?

Large language models can perform new tasks by adapting to a few in-cont...

Faithful Reasoning Using Large Language Models

Although contemporary large language models (LMs) demonstrate impressive...

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

Despite the rapid progress in multihop question-answering (QA), models s...