Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

09/20/2022
by   Pan Lu @ UCLA, et al.
73

When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (ScienceQA), a new benchmark that consists of  21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions. ScienceQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20 fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96 language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40 and code are available at https://scienceqa.github.io.

READ FULL TEXT

page 4

page 5

page 16

page 18

research
04/25/2023

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Modern systems for multi-hop question answering (QA) typically break que...
research
02/09/2023

Explanation Selection Using Unlabeled Data for In-Context Learning

Recent work has addressed textual reasoning tasks by prompting large lan...
research
03/29/2023

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

The present study aims to explore the capabilities of Language Models (L...
research
10/07/2022

Measuring and Narrowing the Compositionality Gap in Language Models

We investigate the ability of language models to perform compositional r...
research
05/22/2023

Beneath Surface Similarity: Large Language Models Make Reasonable Scientific Analogies after Structure Abduction

Analogical reasoning is essential for human cognition, allowing us to co...
research
05/24/2023

The Art of SOCRATIC QUESTIONING: Zero-shot Multimodal Reasoning with Recursive Thinking and Self-Questioning

Chain-of-Thought prompting (CoT) enables large-scale language models to ...
research
05/07/2023

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Large Language Models (LLMs) can achieve strong performance on many task...

Please sign up or login with your details

Forgot password? Click here to reset