Chain of Thought Prompt Tuning in Vision Language Models

04/16/2023
by   Jiaxin Ge, et al.
0

Language-Image Pre-training has demonstrated promising results on zero-shot and few-shot downstream tasks by prompting visual models with natural language prompts. However, most recent studies only use a single prompt for tuning, neglecting the inherent step-to-step cognitive reasoning process that humans conduct in complex task settings, for example, when processing images from unfamiliar domains. Chain of Thought is a simple and effective approximation to human reasoning process and has been proven useful for natural language processing (NLP) tasks. Based on this cognitive intuition, we believe that conducting effective reasoning is also an important problem in visual tasks, and a chain of thought could be a solution to this problem. In this work, we propose a novel chain of thought prompt tuning for vision-language modeling. Extensive experiments show that our method not only generalizes better in image classification tasks, has greater transferability beyond a single dataset, and has stronger domain generalization performance, but also performs much better in imagetext retrieval and visual question answering, which require more reasoning capabilities. We are the first to successfully adapt chain-of-thought prompting that combines visual and textual embeddings. We will release our codes

READ FULL TEXT
research
05/28/2023

Tab-CoT: Zero-shot Tabular Chain of Thought

The chain-of-though (CoT) prompting methods were successful in various n...
research
05/03/2023

Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

Recent advances in large language models elicit reasoning in a chain of ...
research
09/08/2023

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

Vision-language models (VLMs) have recently demonstrated strong efficacy...
research
05/17/2023

Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models

In this paper, we take the initiative to investigate the performance of ...
research
12/31/2022

Rethinking with Retrieval: Faithful Large Language Model Inference

Despite the success of large language models (LLMs) in various natural l...
research
09/06/2023

Aligning Large Language Models for Clinical Tasks

Large Language Models (LLMs) have demonstrated remarkable adaptability, ...
research
02/13/2023

Can GPT-3 Perform Statutory Reasoning?

Statutory reasoning is the task of reasoning with facts and statutes, wh...

Please sign up or login with your details

Forgot password? Click here to reset