Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

07/12/2023
by   Gengyuan Zhang, et al.
0

Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location. To address this question, we propose a two-stage and probing task, applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the investigation, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In the extensive experimental studies, we find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. We will release our dataset and codes to facilitate future studies.

READ FULL TEXT

page 1

page 4

page 7

page 8

research
09/14/2021

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Commonsense is defined as the knowledge that is shared by everyone. Howe...
research
02/02/2023

QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning

Daily images may convey abstract meanings that require us to memorize an...
research
03/01/2022

There is a Time and Place for Reasoning Beyond the Image

Images are often more significant than only the pixels to human eyes, as...
research
09/10/2021

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding

Large-scale, pre-trained language models (LMs) have achieved human-level...
research
03/26/2022

Visual Abductive Reasoning

Abductive reasoning seeks the likeliest possible explanation for partial...
research
10/14/2022

MiQA: A Benchmark for Inference on Metaphorical Questions

We propose a benchmark to assess the capability of large language models...
research
10/02/2022

Does Wikidata Support Analogical Reasoning?

Analogical reasoning methods have been built over various resources, inc...

Please sign up or login with your details

Forgot password? Click here to reset