ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

04/19/2022
by   Chunyuan Li, et al.
2

Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets/tasks. However, it remains a challenge to evaluate the transferablity of these foundation models due to the lack of easy-to-use toolkits for fair benchmarking. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark to compare and evaluate pre-trained language-augmented visual models. Several highlights include: (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to ensure the fairness in model adaption. To leverage the full power of language-augmented visual models, novel language-aware initialization methods are proposed to significantly improve the adaption performance. (iii) Metrics. A variety of evaluation metrics are used, including sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning). We will release our toolkit and evaluation platforms for the research community.

READ FULL TEXT
research
04/20/2022

K-LITE: Learning Transferable Visual Models with External Knowledge

Recent state-of-the-art computer vision systems are trained from natural...
research
03/14/2022

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment

CLIP has shown a remarkable zero-shot capability on a wide range of visi...
research
05/29/2020

Massive Choice, Ample Tasks (MaChAmp):A Toolkit for Multi-task Learning in NLP

Transfer learning, particularly approaches that combine multi-task learn...
research
04/29/2023

POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models

Through prompting, large-scale pre-trained models have become more expre...
research
03/14/2018

SentEval: An Evaluation Toolkit for Universal Sentence Representations

We introduce SentEval, a toolkit for evaluating the quality of universal...
research
01/17/2023

Learning Customized Visual Models with Retrieval-Augmented Knowledge

Image-text contrastive learning models such as CLIP have demonstrated st...
research
05/24/2023

Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

In this paper, we introduce Ranger - a toolkit to facilitate the easy us...

Please sign up or login with your details

Forgot password? Click here to reset