CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

11/07/2022
by   Maitreya Patel, et al.
0

Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings – videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).

READ FULL TEXT

page 1

page 5

page 6

page 8

page 14

research
10/23/2019

KnowIT VQA: Answering Knowledge-Based Questions about Videos

We propose a novel video understanding task by fusing knowledge-based an...
research
05/02/2022

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos

Objects' motions in nature are governed by complex interactions and thei...
research
12/08/2020

CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions

Recent advances in Artificial Intelligence and deep learning have revive...
research
07/06/2022

Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning

Spatial reasoning poses a particular challenge for intelligent agents an...
research
05/05/2022

What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning

Understanding spatial relations is essential for intelligent agents to a...
research
12/28/2015

Visually Indicated Sounds

Objects make distinctive sounds when they are hit or scratched. These so...
research
10/08/2020

Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset

Modern social intelligence includes the ability to watch videos and answ...

Please sign up or login with your details

Forgot password? Click here to reset