DeepAI AI Chat
Log In Sign Up

Learning without Forgetting for Vision-Language Models

by   Da-Wei Zhou, et al.
Nanjing University
Nanyang Technological University

Class-Incremental Learning (CIL) or continual learning is a desired capability in the real world, which requires a learning system to adapt to new tasks without forgetting former ones. While traditional CIL methods focus on visual information to grasp core features, recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations with the aid of textual information. However, when continually trained with new classes, VLMs often suffer from catastrophic forgetting of former knowledge. Applying VLMs to CIL poses two major challenges: 1) how to adapt the model without forgetting; and 2) how to make full use of the multi-modal information. To this end, we propose PROjectiOn Fusion (PROOF) that enables VLMs to learn without forgetting. To handle the first challenge, we propose training task-specific projections based on the frozen image/text encoders. When facing new tasks, new projections are expanded and former projections are fixed, alleviating the forgetting of old concepts. For the second challenge, we propose the fusion module to better utilize the cross-modality information. By jointly adjusting visual and textual features, the model can capture semantic information with stronger representation ability. Extensive experiments on nine benchmark datasets validate PROOF achieves state-of-the-art performance.


page 10

page 14


CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Current state-of-the-art vision-and-language models are evaluated on tas...

RepCL: Exploring Effective Representation for Continual Text Classification

Continual learning (CL) aims to constantly learn new knowledge over time...

Leveraging Old Knowledge to Continually Learn New Classes in Medical Images

Class-incremental continual learning is a core step towards developing a...

Class-Incremental Grouping Network for Continual Audio-Visual Learning

Continual learning is a challenging problem in which models need to be t...

Continual Learning with Scaled Gradient Projection

In neural networks, continual learning results in gradient interference ...

Referring Expression Comprehension via Cross-Level Multi-Modal Fusion

As an important and challenging problem in vision-language tasks, referr...

Towards Continual Knowledge Learning of Language Models

Large Language Models (LMs) are known to encode world knowledge in their...