ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

05/24/2023
by   Heming Xia, et al.
0

Recently, Pretrained Language Models (PLMs) have been serving as general-purpose interfaces, posing a significant demand for comprehensive visual knowledge. However, it remains unclear how well current PLMs and their visually augmented counterparts (VaLMs) can master visual commonsense knowledge. To investigate this, we propose ImageNetVC, a fine-grained, human-annotated dataset specifically designed for zero-shot visual commonsense evaluation across 1,000 ImageNet categories. Utilizing ImageNetVC, we delve into the fundamental visual commonsense knowledge of both unimodal PLMs and VaLMs, uncovering the scaling law and the influence of the backbone model on VaLMs. Furthermore, we investigate the factors affecting the visual commonsense knowledge of large-scale models, providing insights into the development of language models enriched with visual commonsense knowledge. Our code and dataset are available at https://github.com/hemingkx/ImageNetVC.

READ FULL TEXT

page 4

page 15

research
05/14/2022

What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge

There are limitations in learning language from text alone. Therefore, r...
research
11/10/2022

Zero-shot Visual Commonsense Immorality Prediction

Artificial intelligence is currently powering diverse real-world applica...
research
05/23/2023

Can Language Models Understand Physical Concepts?

Language models (LMs) gradually become general-purpose interfaces in the...
research
05/31/2023

What does the Failure to Reason with "Respectively" in Zero/Few-Shot Settings Tell Us about Language Models?

Humans can effortlessly understand the coordinate structure of sentences...
research
01/01/2022

Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization

Commonsense question answering (CQA) aims to test if models can answer q...
research
09/15/2022

VIPHY: Probing "Visible" Physical Commonsense Knowledge

In recent years, vision-language models (VLMs) have shown remarkable per...
research
05/04/2022

Visual Commonsense in Pretrained Unimodal and Multimodal Models

Our commonsense knowledge about objects includes their typical visual at...

Please sign up or login with your details

Forgot password? Click here to reset