MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

06/20/2023
by   Yongzhu Miao, et al.
0

Prompt tuning, like CoOp, has recently shown promising vision recognizing and transfer learning ability on various downstream tasks with the emergence of large pre-trained vision-language models like CLIP. However, we identify that existing uni-modal prompt tuning approaches may result in sub-optimal performance since this uni-modal design breaks the original alignment of textual and visual representations in the pre-trained model. Inspired by the nature of pre-trained vision-language models, we aim to achieve completeness in prompt tuning and propose a novel approach called Multi-modal Deep-symphysis Prompt Tuning, dubbed as MuDPT, which extends independent multi-modal prompt tuning by additionally learning a model-agnostic transformative network to allow deep hierarchical bi-directional prompt fusion. We evaluate the effectiveness of MuDPT on few-shot vision recognition and out-of-domain generalization tasks. Compared with the state-of-the-art methods, MuDPT achieves better recognition and generalization ability with an apparent margin thanks to synergistic alignment of textual and visual representations. Our code is available at: https://github.com/Mechrev0/MuDPT.

READ FULL TEXT

page 1

page 3

research
10/06/2022

MaPLe: Multi-modal Prompt Learning

Pre-trained vision-language (V-L) models such as CLIP have shown excelle...
research
03/20/2023

Visual Prompt Multi-Modal Tracking

Visible-modal object tracking gives rise to a series of downstream multi...
research
05/12/2023

Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters

Humans learn language via multi-modal knowledge. However, due to the tex...
research
09/08/2023

Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment

Large-scale vision-language models (VLMs), e.g., CLIP, learn broad visua...
research
08/29/2023

Read-only Prompt Optimization for Vision-Language Few-shot Learning

In recent years, prompt tuning has proven effective in adapting pre-trai...
research
06/01/2023

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting

Pre-trained language models (PLMs) have played an increasing role in mul...
research
11/24/2022

Delving into Out-of-Distribution Detection with Vision-Language Representations

Recognizing out-of-distribution (OOD) samples is critical for machine le...

Please sign up or login with your details

Forgot password? Click here to reset