Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

by   Pan Lu @ UCLA, et al.

When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (ScienceQA), a new benchmark that consists of  21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions. ScienceQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20 fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96 language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40 and code are available at https://scienceqa.github.io.


page 4

page 5

page 16

page 18


Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Modern systems for multi-hop question answering (QA) typically break que...

Explanation Selection Using Unlabeled Data for In-Context Learning

Recent work has addressed textual reasoning tasks by prompting large lan...

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

The present study aims to explore the capabilities of Language Models (L...

Measuring and Narrowing the Compositionality Gap in Language Models

We investigate the ability of language models to perform compositional r...

Beneath Surface Similarity: Large Language Models Make Reasonable Scientific Analogies after Structure Abduction

Analogical reasoning is essential for human cognition, allowing us to co...

The Art of SOCRATIC QUESTIONING: Zero-shot Multimodal Reasoning with Recursive Thinking and Self-Questioning

Chain-of-Thought prompting (CoT) enables large-scale language models to ...

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Large Language Models (LLMs) can achieve strong performance on many task...

Please sign up or login with your details

Forgot password? Click here to reset