CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination

07/08/2022
by   Hyounghun Kim, et al.
3

As humans, we can modify our assumptions about a scene by imagining alternative objects or concepts in our minds. For example, we can easily anticipate the implications of the sun being overcast by rain clouds (e.g., the street will get wet) and accordingly prepare for that. In this paper, we introduce a new task/dataset called Commonsense Reasoning for Counterfactual Scene Imagination (CoSIm) which is designed to evaluate the ability of AI systems to reason about scene change imagination. In this task/dataset, models are given an image and an initial question-response pair about the image. Next, a counterfactual imagined scene change (in textual form) is applied, and the model has to predict the new response to the initial question based on this scene change. We collect 3.5K high-quality and challenging data instances, with each instance consisting of an image, a commonsense question with a response, a description of a counterfactual change, a new response to the question, and three distractor responses. Our dataset contains various complex scene change types (such as object addition/removal/state change, event description, environment change, etc.) that require models to imagine many different scenarios and reason about the changed scenes. We present a baseline model based on a vision-language Transformer (i.e., LXMERT) and ablation studies. Through human evaluation, we demonstrate a large human-model performance gap, suggesting room for promising future work on this challenging counterfactual, scene imagination task. Our code and dataset are publicly available at: https://github.com/hyounghk/CoSIm

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

page 8

page 10

page 11

research
05/30/2022

From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Video understanding has achieved great success in representation learnin...
research
11/09/2019

CommonGen: A Constrained Text Generation Dataset Towards Generative Commonsense Reasoning

Rational humans can generate sentences that cover a certain set of conce...
research
10/15/2020

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Given a video with aligned dialogue, people can often infer what is more...
research
10/12/2020

Social Commonsense Reasoning with Multi-Head Knowledge Attention

Social Commonsense Reasoning requires understanding of text, knowledge a...
research
12/14/2022

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

From a visual scene containing multiple people, human is able to disting...
research
04/04/2021

FixMyPose: Pose Correctional Captioning and Retrieval

Interest in physical therapy and individual exercises such as yoga/dance...
research
02/02/2022

Learning to reason about and to act on physical cascading events

Reasoning and interacting with dynamic environments is a fundamental pro...

Please sign up or login with your details

Forgot password? Click here to reset