Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

12/14/2022
by   Haoxuan You, et al.
0

From a visual scene containing multiple people, human is able to distinguish each individual given the context descriptions about what happened before, their mental/physical states or intentions, etc. Above ability heavily relies on human-centric commonsense knowledge and reasoning. For example, if asked to identify the "person who needs healing" in an image, we need to first know that they usually have injuries or suffering expressions, then find the corresponding visual clues before finally grounding the person. We present a new commonsense task, Human-centric Commonsense Grounding, that tests the models' ability to ground individuals given the context descriptions about what happened before, and their mental/physical states or intentions. We further create a benchmark, HumanCog, a dataset with 130k grounded commonsensical descriptions annotated on 67k images, covering diverse types of commonsense and visual scenes. We set up a context-object-aware method as a strong baseline that outperforms previous pre-trained and non-pretrained models. Further analysis demonstrates that rich visual commonsense and powerful integration of multi-modal commonsense are essential, which sheds light on future works. Data and code will be available https://github.com/Hxyou/HumanCog.

READ FULL TEXT

page 1

page 4

page 5

page 6

page 9

research
04/22/2020

Visual Commonsense Graphs: Reasoning about the Dynamic Context of a Still Image

Even from a single frame of a still image, people can reason about the d...
research
09/15/2022

VIPHY: Probing "Visible" Physical Commonsense Knowledge

In recent years, vision-language models (VLMs) have shown remarkable per...
research
11/22/2022

Visually Grounded Commonsense Knowledge Acquisition

Large-scale commonsense knowledge bases empower a broad range of AI appl...
research
11/27/2018

From Recognition to Cognition: Visual Commonsense Reasoning

Visual understanding goes well beyond object recognition. With one glanc...
research
07/16/2023

Recognition of Mental Adjectives in An Efficient and Automatic Style

In recent years, commonsense reasoning has received more and more attent...
research
07/08/2022

CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination

As humans, we can modify our assumptions about a scene by imagining alte...
research
08/16/2021

Who's Waldo? Linking People Across Text and Images

We present a task and benchmark dataset for person-centric visual ground...

Please sign up or login with your details

Forgot password? Click here to reset