VIPHY: Probing "Visible" Physical Commonsense Knowledge

09/15/2022
by   Shikhar Singh, et al.
4

In recent years, vision-language models (VLMs) have shown remarkable performance on visual reasoning tasks (e.g. attributes, location). While such tasks measure the requisite knowledge to ground and reason over a given visual instance, they do not, however, measure the ability of VLMs to retain and generalize such knowledge. In this work, we evaluate their ability to acquire "visible" physical knowledge – the information that is easily accessible from images of static scenes, particularly across the dimensions of object color, size and space. We build an automatic pipeline to derive a comprehensive knowledge resource for calibrating and probing these models. Our results indicate a severe gap between model and human performance across all three tasks. Furthermore, our caption pretrained baseline (CapBERT) significantly outperforms VLMs on both size and spatial tasks – highlighting that despite sufficient access to ground language with visual modality, they struggle to retain such knowledge. The dataset and code are available at https://github.com/Axe–/ViPhy .

READ FULL TEXT

page 3

page 13

page 14

research
12/14/2022

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

From a visual scene containing multiple people, human is able to disting...
research
05/02/2020

ESPRIT: Explaining Solutions to Physical Reasoning Tasks

Neural networks lack the ability to reason about qualitative physics and...
research
09/14/2021

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Commonsense is defined as the knowledge that is shared by everyone. Howe...
research
05/24/2023

ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

Recently, Pretrained Language Models (PLMs) have been serving as general...
research
05/04/2022

Visual Commonsense in Pretrained Unimodal and Multimodal Models

Our commonsense knowledge about objects includes their typical visual at...
research
09/06/2021

An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models

Prompt-based knowledge probing for 1-hop relations has been used to meas...
research
05/20/2023

Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?

Pre-training on large corpora of text enables the language models to acq...

Please sign up or login with your details

Forgot password? Click here to reset