Exploring Vision-Language Models for Imbalanced Learning

04/04/2023
by   Yidong Wang, et al.
0

Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance. However, their performance on imbalanced dataset is relatively poor, where the distribution of classes in the training dataset is skewed, leading to poor performance in predicting minority classes. For instance, CLIP achieved only 5 accuracy on the iNaturalist18 dataset. We propose to add a lightweight decoder to VLMs to avoid OOM (out of memory) problem caused by large number of classes and capture nuanced features for tail classes. Then, we explore improvements of VLMs using prompt tuning, fine-tuning, and incorporating imbalanced algorithms such as Focal Loss, Balanced SoftMax and Distribution Alignment. Experiments demonstrate that the performance of VLMs can be further boosted when used with decoder and imbalanced methods. Specifically, our improved VLMs significantly outperforms zero-shot classification by an average accuracy of 6.58 and 6.17 further analyze the influence of pre-training data size, backbones, and training cost. Our study highlights the significance of imbalanced learning algorithms in face of VLMs pre-trained by huge data. We release our code at https://github.com/Imbalance-VLM/Imbalance-VLM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2022

Unsupervised Prompt Learning for Vision-Language Models

Contrastive vision-language models like CLIP have shown great progress i...
research
07/13/2021

How Much Can CLIP Benefit Vision-and-Language Tasks?

Most existing Vision-and-Language (V L) models rely on pre-trained vis...
research
05/15/2022

Fine-tuning Pre-trained Language Models for Few-shot Intent Detection: Supervised Pre-training and Isotropization

It is challenging to train a good intent classifier for a task-oriented ...
research
05/09/2023

Boosting Visual-Language Models by Exploiting Hard Samples

Large vision and language models, such as Contrastive Language-Image Pre...
research
09/20/2021

Balanced-MixUp for Highly Imbalanced Medical Image Classification

Highly imbalanced datasets are ubiquitous in medical image classificatio...
research
08/05/2021

ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot

One-stage long-tailed recognition methods improve the overall performanc...
research
03/18/2022

Prototypical Verbalizer for Prompt-based Few-shot Tuning

Prompt-based tuning for pre-trained language models (PLMs) has shown its...

Please sign up or login with your details

Forgot password? Click here to reset