Neural Message Passing on Hybrid Spatio-Temporal Visual and Symbolic Graphs for Video Understanding

05/17/2019
by   Effrosyni Mavroudi, et al.
0

Many problems in video understanding require labeling multiple activities occurring concurrently in different parts of a video, including the objects and actors participating in such activities. However, state-of-the-art methods in computer vision focus primarily on tasks such as action classification, action detection, or action segmentation, where typically only one action label needs to be predicted. In this work, we propose a generic approach to classifying one or more nodes of a spatio-temporal graph grounded on spatially localized semantic entities in a video, such as actors and objects. In particular, we combine an attributed spatio-temporal visual graph, which captures visual context and interactions, with an attributed symbolic graph grounded on the semantic label space, which captures relationships between multiple labels. We further propose a neural message passing framework for jointly refining the representations of the nodes and edges of the hybrid visual-symbolic graph. Our framework features a) node-type and edge-type conditioned filters and adaptive graph connectivity, b) a soft-assignment module for connecting visual nodes to symbolic nodes and vice versa, c) a symbolic graph reasoning module that enforces semantic coherence and d) a pooling module for aggregating the refined node and edge representations for downstream classification tasks. We demonstrate the generality of our approach on a variety of tasks, such as temporal subactivity classification and object affordance classification on the CAD-120 dataset and multilabel temporal action localization on the large scale Charades dataset, where we outperform existing deep learning approaches, using only raw RGB frames.

READ FULL TEXT

page 2

page 6

page 8

research
03/29/2021

Unified Graph Structured Models for Video Understanding

Accurate video understanding involves reasoning about the relationships ...
research
12/09/2019

STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Spatio-temporal action localization is a challenging yet fascinating tas...
research
09/17/2020

Dynamic Regions Graph Neural Networks for Spatio-Temporal Reasoning

Graph Neural Networks are perfectly suited to capture latent interaction...
research
12/15/2019

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

Action recognition has typically treated actions and activities as monol...
research
02/01/2015

Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition

Action recognition is an important problem in multimedia understanding. ...
research
03/25/2019

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

Visual relationship reasoning is a crucial yet challenging task for unde...
research
12/15/2020

Object-based attention for spatio-temporal reasoning: Outperforming neuro-symbolic models with flexible distributed architectures

Neural networks have achieved success in a wide array of perceptual task...

Please sign up or login with your details

Forgot password? Click here to reset