Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

05/03/2023
by   Daniel Rose, et al.
0

Recent advances in large language models elicit reasoning in a chain of thought that allows models to decompose problems in a human-like fashion. Though this paradigm improves multi-step reasoning ability in language models, it is limited by being unimodal and applied mainly to question-answering tasks. We claim that incorporating visual augmentation into reasoning is essential, especially for complex, imaginative tasks. Consequently, we introduce VCoT, a novel method that leverages chain of thought prompting with vision-language grounding to recursively bridge the logical gaps within sequential data. Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks that can benefit from temporal reasoning, as well as provide interpretability into models' multi-step reasoning. We apply VCoT to the Visual Storytelling and WikiHow summarization datasets and demonstrate through human evaluation that VCoT offers novel and consistent synthetic data augmentation beating chain of thought baselines, which can be used to enhance downstream performance.

READ FULL TEXT

page 6

page 8

page 13

page 14

page 15

page 16

page 17

page 18

research
04/16/2023

Chain of Thought Prompt Tuning in Vision Language Models

Language-Image Pre-training has demonstrated promising results on zero-s...
research
10/03/2022

Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought

Large language models (LLMs) have shown remarkable reasoning capabilitie...
research
05/23/2023

Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction

Despite constituting 65 underrepresented in generative AI research. Mean...
research
06/10/2023

Human-in-the-Loop through Chain-of-Thought

While the emergence of powerful language models along with Chain-of-thou...
research
08/11/2023

Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals

Reasoning ability is one of the most crucial capabilities of a foundatio...
research
05/26/2023

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models

With the widespread use of large language models (LLMs) in NLP tasks, re...
research
08/16/2023

Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought

Recent advancements in large-scale models, such as GPT-4, have showcased...

Please sign up or login with your details

Forgot password? Click here to reset