Continual Vision-Language Representation Learning with Off-Diagonal Information

05/11/2023
by   Zixuan Ni, et al.
0

This paper discusses the feasibility of continuously training the CLIP model through streaming data. Then, by tracking the directional changes of the representation vectors in the continuously updated CLIP model, we explore and summarize these spatial variations as Spatial Disorder (SD), which can be divided into Intra-modal Rotation and Inter-modal Deviation. Moreover, we demonstrate how intra-modal rotation and inter-modal deviation lead to a performance decline for CLIP on cross-modal retrieval tasks in both empirically and theoretically. To alleviate the spatial disorder, we propose a simple yet effective continual learning framework Mod-X: Maintain off-diagonal information-matriX. The experiments (in Section <ref>, <ref> and Appendix <ref>) on commonly used datasets with different scales and scopes have illustrated the effectiveness of our method.

READ FULL TEXT

page 5

page 6

page 17

research
04/14/2021

Continual learning in cross-modal retrieval

Multimodal representations and continual learning are two areas closely ...
research
04/15/2023

CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval

Current vision-language retrieval aims to perform cross-modal instance s...
research
05/24/2022

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

Multimodal learning from document data has achieved great success lately...
research
05/28/2021

Learning Relation Alignment for Calibrated Cross-modal Retrieval

Despite the achievements of large-scale multimodal pre-training approach...
research
04/30/2020

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO

Image captioning datasets have proven useful for multimodal representati...
research
04/08/2022

General Incremental Learning with Domain-aware Categorical Representations

Continual learning is an important problem for achieving human-level int...

Please sign up or login with your details

Forgot password? Click here to reset