Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

03/30/2021
by   Zhenfang Chen, et al.
5

We study the problem of dynamic visual reasoning on raw videos. This is a challenging problem; currently, state-of-the-art models often require dense supervision on physical object properties and events from simulation, which are impractical to obtain in real life. In this paper, we present the Dynamic Concept Learner (DCL), a unified framework that grounds physical objects and events from video and language. DCL first adopts a trajectory extractor to track each object over time and to represent it as a latent, object-centric feature vector. Building upon this object-centric representation, DCL learns to approximate the dynamic interaction among objects using graph networks. DCL further incorporates a semantic parser to parse questions into semantic programs and, finally, a program executor to run the program to answer the question, levering the learned dynamics model. After training, DCL can detect and associate objects across the frames, ground visual properties, and physical events, understand the causal relationship between events, make future and counterfactual predictions, and leverage these extracted presentations for answering queries. DCL achieves state-of-the-art performance on CLEVRER, a challenging causal video reasoning dataset, even without using ground-truth attributes and collision labels from simulations for training. We further test DCL on a newly proposed video-retrieval and event localization dataset derived from CLEVRER, showing its strong generalization capacity.

READ FULL TEXT

page 2

page 3

page 8

page 9

page 19

page 20

research
10/28/2021

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

In this work, we propose a unified framework, called Visual Reasoning wi...
research
09/26/2019

COPHY: Counterfactual Learning of Physical Dynamics

Understanding causes and effects in mechanical systems is an essential c...
research
12/09/2021

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

A critical aspect of human visual perception is the ability to parse vis...
research
02/02/2022

Learning to reason about and to act on physical cascading events

Reasoning and interacting with dynamic environments is a fundamental pro...
research
02/01/2022

Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space

Learning causal relationships in high-dimensional data (images, videos) ...
research
10/03/2019

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

The ability to reason about temporal and causal events from videos lies ...
research
08/15/2023

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

We introduce an object-aware decoder for improving the performance of sp...

Please sign up or login with your details

Forgot password? Click here to reset