Visual Abductive Reasoning

03/26/2022
by   Chen Liang, et al.
3

Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations. Given an incomplete set of visual events, AI systems are required to not only describe what is observed, but also infer the hypothesis that can best explain the visual premise. Based on our large-scale VAR dataset, we devise a strong baseline model, Reasoner (causal-and-cascaded reasoning Transformer). First, to capture the causal structure of the observations, a contextualized directional position embedding strategy is adopted in the encoder, that yields discriminative representations for the premise and hypothesis. Then, multiple decoders are cascaded to generate and progressively refine the premise and hypothesis sentences. The prediction scores of the sentences are used to guide cross-sentence information flow in the cascaded reasoning procedure. Our VAR benchmarking results show that Reasoner surpasses many famous video-language models, while still being far behind human performance. This work is expected to foster future efforts in the reasoning-beyond-observation paradigm.

READ FULL TEXT

page 2

page 4

page 6

page 7

page 8

page 11

page 14

page 15

research
10/03/2019

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

The ability to reason about temporal and causal events from videos lies ...
research
07/07/2022

Can Language Models perform Abductive Commonsense Reasoning?

Abductive Reasoning is a task of inferring the most plausible hypothesis...
research
03/25/2020

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

We introduce a new task, Video-and-Language Inference, for joint multimo...
research
07/12/2023

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Vision-Language Models (VLMs) are expected to be capable of reasoning wi...
research
09/06/2018

Cascaded Mutual Modulation for Visual Reasoning

Visual reasoning is a special visual question answering problem that is ...
research
02/02/2016

Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects

Human vision greatly benefits from the information about sizes of object...
research
06/20/2019

Computer-Simulation Model Theory (P= NP is not provable)

The simulation hypothesis says that all the materials and events in the ...

Please sign up or login with your details

Forgot password? Click here to reset